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EDITOE’S PREFACE 

In prefacing Part I of the Sixteenth Yearbook it was stated 
that the policy of the Society to give preference in its Yearbooks to 
contributions that would disseminate the reports of important com- 
mittees in advance of the meetings at which they are to bo discussed 
had been favorably received both by members of the Society and by 
large numbers of the educational public. It was also stated as “not 
unlikdy that other committees and organizations of men profes- 
sionally active in various aspects of educational endeavor will be 
glad to make ramilar use of the Society’s avenues of publication in 
the future.” This expectation has been realized in a most gratii^- 
ing way in the Seventeenth Yearbook. I wish to congratulate mem- 
bers of the Society on this opportunity to serve themselves as well 
as the cause of education by cooperating with the National Associ- 
ation of Directors of Educational Research in the publication of the 
Yearbook these investigators have prepared. 


G. M. WmmA 



COMMITTEE’S PREFACE 

The National Association of Directors of Educational Re- 
search present this Yearbook on Educational Measurement to the 
superintendents and teachers of American schools, hoping that it 
ma7 prove of practical value to them in their work. Its purpose is 
to gather into one handy volume a rather complete statement of the 
various aspects of a new movement which seems destined to have 
a profound and permanent influence upon American education. 
Each chapter has been written by a different member of the Associ- 
ation, and as in any new field of work, complete agreement in either 
theory or practice is not to be expected, so in this volume the careful 
reader will detect many evidences of healthy variations in ideals, 
aims and methods. However, it is believed that these differences 
are not serious enough to mar the unity of plan and content and 
that the book as a whole represents the best judgment of the Associ- 
ation as to what information is of greatest practical worth. The 
Yearbook is issued in the hope that it may further the cause for 
which the Association stands — ^the promotion of educational re- 
search in American public schools. 

The editorial committee, on behalf of the Association, hereby 
gratefully acknowledges its indebtedness to the National Society for 
the Study of Education, whose cooperation has made the publica- 
tion of a yearbook possible. 

EDITORIAL COMMITTEE 
Stuabt a. Courtis, Chairman, 
Leonard P. Atres, 

B. R. Buckinoham, 



CHAPTER I 

HISTORY AND PRESENT STATUS OF 
EDUCATIONAL MEASUREMENTS 


lilONABD P. ATBSS 

BiTision oi Education, Bussell Sage Foundation, New York CS^ 


Measarements in education are fifty years old if we count from 
the oldest beginnings of which we have record. They are twenty- 
five years old if we reckon from the time that Dr Rice, the pioneer 
and pathmaker among American scientific students of education, 
began his work in this field. They are ten years old if we begin our 
count with the earliest efforts of Professor Thorndike, who is the 
father of the present movement 

We are indebted to Professor Thorndike for having discov- 
ered what is apparently the earliest record of work in the field of 
educational measurements as we now use that term. As early as 
1864 a school master in England, the Rev. George Fisher, of the 
Greenwich Hospital School, had seen the need and possibilities of 
standards, and with prophetic foresight anticipated present-day 
achievements. His practice was as follows: “A book called the 
* Scale-Book’ has been established, which contains the numbers as- 
signed to each degree of proficiency in the various subjects of exam- 
ination : for instance, if it be required to determine the numerical 
equivalent eorresponding to any speeimen of ‘writing,’ a compan- 
son is made with the various standard specimens, which arc 
arranged in this book in order of merit; the highest being repre- 
sented by the number 1, and the lowest by 5, and the intermediate 
values by affixing to these numbers the fractions *4» V 2 , or %. So 
long as these standard specimens are preserved in the institution, 
so long will instant numerical values for proficiency in ‘writing’ be 
maintained. And since fac-similes can be multiplied without limit, 
the same principle might be generally adopted. 

“The numerical values for ‘spelling’ follow the same order 
and are made to depend upon the percentage of ndstid^ m writing 

9 
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from dictation sentences from works sdected for the purpose, exam* 
pies of whidi are contained in the ‘Scale-Book,’ in order to preserve 
the same standard of dif&ccdty. 

“By a timilar process valnes are astigned for proficiency in 
mathematics, navigation. Scripture, knowledge, grammar and com- 
position, French, general history, drawing, and practical science, 
respectively. Questions in each of these subjects are contained in 
the ‘ Scale-Book, ’ to serve as t}i>es, not only of the difficulty, but of 
the nature of the questions, for the sake of future reference; observ- 
ing that the same numerals are used in the same oirder as before, viz., 
number 1 denotes the highest, and number 5 the lowest amount 
of attainment. 

“In respect to the numerical values of ‘reading,’ as regards 
accuracy, taste or judgment, it is obvious that no other standard of 
measurement can be applied, beyond the interpretation of the terms 
‘good,’ ‘bad,’ ‘indifferent,’ etc., existing at the period of examin- 
ation. And the same observation will apply to the estimation of 
numbers of ‘characters’ and ‘natural abilities,’ as determined by 
the united testimony of the respective masters. 

“Having stated this much with regard to the plan pursued in 
this school, I may well add that the advantage derived from this 
numerical mode of valuation, as applied to educational subject^ is 
not confined to its being a concise method of registration, combined 
with a useful approximation to a fixed standard of estimation, appli- 
cable to each boy ; but it affords also a means of determining the 
sum toted, and therefrom the means or average condition or value 
of any given number of results.”* 

Mr. Fisher’s efforts seem to have produced no lasting results. 
Progress in the scientific study of education was not possible until 
people could be brought to realize that human behavior was suscep- 
tible of quantitative study, and until they had statistical methods 
with which to carry on their investigations. Both of these were 
contributed in large measure by Sir Francis Galton. As early as 
1875 he published scientific studies of the traits of twins, of number- 

*B^ort«d bj E. B. Chadvidc in the Jfwenm, a Qnarterig Magatine of 
Edueation, Literature and deienee, ToL in, 1864. See also Journal of Eduea- 
tional Esydhology, VoL 4, page 551. 
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forms, of color-blindness, and of the efficacy of prayer. Out of his 
work came much of experimental and educational pqrchology, and 
indirectly, educational measurements. It was he who developed the 
statistical methods necessary for the quantitative study of material 
which seemed at the outset entirdy qualitative and not at aU numer- 
ical in nature. 

In America the real inventor of the comparative test was Dr. 
J. M. Bice. Dr. Bice studied in Germany and came under the influ- 
ence of the German psychologists at Jena and Leipsic. Betuming 
to this country, he became interested in education and one day in 
1894 the new idea was bom. Of this invention Dr. Bice says : ‘*In 
truth, however, I came to recognize that this (the daims of school 
men following different courses of study) was all talk, — ^that no one 
really knew the facts, because there were no standards to serve as 
guides. Then one day, the idea flashed through my mind that the 
way to settle the question was to try it out For a beginning I de- 
cided to take spelling, and on that very day I made up a list of 50 
words with the view of giving them as a test to the pupils of the 
schools as I went on my tour from town to town, I have no record 
of the date of the inspiration, but I think it was some time in Octo- 
ber, 1894.” 

Dr. Bice’s work, however, did not meet with the approval of 
the educators of the day. One of his earlier reports in this Add 
indicated that children who had spent thirty minutes a day for 
eight years in the study of spelling did not at the end of that time 
spell any better than the children in another school system who had 
spent only flfteen minutes a day for eight years in the same study. 
The presentation of these results brought upon the investigator 
almost unlimited attack. The educators who discussed his flndings 
and those who reviewed them in the educational press united in 
denouncing as foolish, reprehensible, and from every point of view 
indefensible, the effort to discover anything about the value of the 
teaching of spelling by finding out whether or not the children could 
spell. They claimed that the object of such woric was not to teach 
children to spell, but to devdop their minds! It was the issue be- 
tween the investigator and the formalist in education, and the con- 
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ffiet that is still under way is the coofliet that was then for the first 
time clearly defined. 

Little by little the more thonghtfol men in the field of educa- 
tion appreciated the suggestive value of Dr. Bice’s work and some 
few of them, notably Professor Hanus of Harvard, dared to come to 
his support. Slowly the tide turned in his favor, until by common 
consent the general validity of his conclusions was tmitatively ac- 
cepted. His methods, however, were not generally adopted, and 
for more than ten years but little progress was made beyond the 
work of the pioneer himsdl 

If Dr. Bice is to be called the inventor of educational measure- 
ment, Professor E. L. Thorndike should be called the father of the 
movement. In 1895, Professor Thorndike was a student at Col- 
umbia, struggling with statistical methods in a course on measure- 
ments under Boas, and “finding it new and very hard for me to 
learn. His interests were in the fidd of psychology and the work 
of Bice made a deep and lasting impression. Gradually his experi- 
mental work came more and more into the educational field. He 
began to preach the need of measurement and to experiment with 
tests and scales. The Stone Arithmetic Tests were published in 1908. 
The Thorndike Scale for the measurement of merit in handwriting 
was presented before Section L of American Association for the 
Advancement of Science at its Boston meeting, in December, 1909, 
and was published in the Teachers College Record the following 
March. The construction of this scale, based on the equal difference 
theorem formulated by Cattell, marks the real beginning of the sci- 
entific measurement of educational products. 

During the past ten years the growth of the scientific movement 
in education has been continuous and rapid. It has been closely 
related to the survey movement which had its real inception in 1907 
in a great social study of the city of Pittsburgh, which was termed 
“A Survey.” Three years later two college professors, Hanus of 
Harvard and Moore of Tale, conducted studies of the school systems 
of Montclmr and East Orange in New Jersey. These studies dif- 
fered from earlier investigations of school eystems in that their 
purpose was to tdl the public about thmr public schools, and each 


H^notatton from a personal letter. 
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investigator, borrowing the term from the contemporary social 
movement, nsed the word “survey” to designate a section of his 
report. In the years that have followed, scores of snrveys of city, 
state and county school systems have been conducted, and in ever 
increasing degree they have utilized, as perhaps the most important 
of their methods, the scales and tests used in the measurment of 
educational processes and products. 

The two movements were represented in the New York school 
inquiry of 1911-12. For the first time in a formal educational in- 
vestigation tests were used as an aid in evaluating the results of 
public-school work. Thes^ were the Courtis arithmetic tests, which 
had by that time attracted a good deal of attention. Their success- 
ful use in the New York survey not only settled all doubts as to the 
availability of the tests themselves for the measurement of educa- 
tional attainment, but also firmly established the principle that in 
conducting school surveys scientific tests must be utilized where 
they are available. 

One of the recommendations of the survey committee in New 
York City was that a Bureau of Research be established to conduct 
a continuous survey, from within the school system itself and for its 
benefit. This recommendation was immediately adopted and the 
Bureau organized in September, 1913. Previous to this time, there 
had been in various cities committees and other organizations, whidi 
had made studies of various phases of administrative and instruc- 
tional work, but to New York City probably belongs the credit of 
first establishing a formal organization having for its purpose the 
continuous critical study of its own activities by scientific methods. 

By this time, Boston, Detroit, and many other cities, were ex- 
perimenting with measurement and obtaining results of value. 
Other bureaus were soon established. The Division of Education of 
the Russell Sage Foundation had turned its attention to work of this 
type as early as 1907, the Boston Bureau was organized in 1913, 
and similar organizations in Detroit, Kansas City, and Oakland 
soon followed. During these same years faith and interest in meas- 
urements had been greatly stimulated by the development of the 
Binet-Simon tests and by the wide-spread attention given the study 
of retardation and elimination in school systems. A demand for 
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men trained for the work was created. Superintendents and teach- 
ers also were clamoring for technical knowledge of methods and for 
explanations of the results obtained. For ten years graduates of 
Teachers College, Columbia University, and of the' School of Edu- 
cation of Chicago University had had impressed upon them that 
measurement and scientific experimentation were highly desirable 
in education and they had been at least partly prepared for such 
work. Next, this trainiug was expanded into formal courses in edu- 
cational measurement. Soon courses in measurement appeared in all 
the great universities and the movement began to gain full momen- 
tum. 

The meetings of the Department of Superintendence of the 
National Education Association afford an excellent index of the 
progress of the movement Dr. Bice’s report in 1897 was received 
with derision. The Philadelphia meeting in 1912, after a heated 
discussion, voted against measurement by a small majority, but two 
years later a committee on Tests and Standards made a favorable 
report which was adopted by a considerable majority. 

Today tests and scales are used throughout this country and 
around the world. In England, Germany, and Prance, before the 
war, beginnings had been made. Scales for the measurement of 
Chinese writing and composition are now in process of construction. 
In Australia and New Zealand, in India and Hawaii, and through- 
out the length and breadth of the United States and Canada, tests 
and scales are in daily service, proving valuable tools in the hands 
of those who know how to use them. 

The scientific method is at base analytic scrutiny, exact meas- 
uring, careful recording, and judgment on the basis of observed 
fact Science in education is not a body of information, but a 
method, and its object is to find out and to learn how. By its aid, 
education is becoming a profesedon. Courses of study are being 
adapted to the needs of children; teaching effort and supervisory 
control are becoming more efficient The center of interest in edu- 
cation has become the child, rather than the teacher, and efforts to 
improve the quality of instruction begin by finding out what the 
children can do, rather than by discuscdng the methods by which the 
teacher proceeds. 
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Educational measurement has been accepted by the American 
public. This Yearbook is in itself a proof of that. That the meth- 
ods of today are still crude and imperfect must be admitted by even 
the most enthusiastic supporters of the movement They deal most 
effectively with only the simple mechanical skills, and even here 
they are still far from perfect. Nevertheless, they are extending 
each year their range of availability and their field of application. 

The importance of the movement lies not only in its past and 
present achievements, but in the hope of the future. Knowledge is 
replacing opinion, and evidence is supplanting guess-work in edu- 
cation as in every other field of human activity. This is the su- 
preme fact to which this Yearbook bears witness. The future de- 
pends upon the skill, the wisdom, and the sagacity of the school men 
and women of America. It is well that they should set about the 
task of enlarging, perfecting, and carrying forward the scientific 
movement in education, for the great war has marked the end of 
the age of haphazard, and the developments of coming years will 
show that this is true in education as in every other organized field 
of human endeavor. 



CHAPTER n 

THE NATUEB, PURPOSES, AND GENERAL METHODS OF 
MEASUREMENTS OF EDUCATIONAL PRODUCTS 


Edwabd L. Thorndike 

Professor of Educational Psychology, Teachers College, Columbia University 


Whatever exists at all exists in some amount. To know it 
thoroughly involves knowing its quantity as well as its quality. 
Education is concerned with changes in human beings; a change 
is a difference between two conditions; each of these conditions is 
known to us only by the products produced by it — ^things made, 
words spoken, acts performed, and the like. To measure any of 
these products means to define its amount in some way so that com- 
petent persons will know how large it is, better than they would 
without measurement. To measure a product well means so to de- 
fine its amount that competent persons will know how large it is, 
with some precision, and that this knowledge may be conveniently 
rcorded and used. This is the general Credo of those who, in the 
last decade, have been busy trying to extend and improve measure- 
ments of educational products.^ 

We have faith that whatever people now measure crudely by 
mere descriptive words, helped out by the comparative and superla- 
tive forms, can be measured more precisely and conveniently if in- 
genuity and labor are set at the task. We have faith also that the 
objective products produced, rather than the inner condition of the 
person whence they spring, are the proper point of attack for the 
measurer, at least in our day and generation. 

This is obviously the same general creed as that of the physicist 
or chemist or physiologist engaged in quantitative thinking — ^the 
same, indeed, as that of modem science in general. And, in general, 

*The conditions to be thoroughly known must be known as quantities a, b, 
e, d, etc., of qualities, or powers, or skills, or knowledges A, B, T, J, etc. — ^tbat 
is, as an equation aAJf-bB~\-eY, etc. 


16 
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the nature of educational measurements is the same as that of 
all scientific measurements. 

In detail, however, there are notable differences. An educa- 
tional product, such as a composition written, a solution of a prob- 
lem in arithmetic, an answer to a question about history, a drawing 
of a house or the performance of an errand, is commonly a complex 
of many sorts of things. The task of measuring it seems more like 
measuring a house or an elephant than it is like measuring a length 
or a volume or a weight. A complete measurement of, say, a composi- 
tion might include an exact definition of its spelling, its usage of 
words, its usage of word forms, its wit, its good sense and so on and 
on ; and each of these might again be subdivided into a score or more 
of component elements. 

What we do, of course, is to make not such a complete meas- 
urement of the total fact, but to measure the amount of some fea- 
ture, e.g., the general merit of the composition or the richness of its 
vocabulary, just as physical science does not measure the elephant, 
but his height, or his weight, or his health, or his strength of pull. 
Every measurement represents a highly partial and abstract treat- 
ment of the product. This is not understood by some of our critics 
who object to tests and scales because of their limited point of view. 
The critic’s real point should be that an educational product com- 
monly invites hundreds of measurements, as we all well know. It 
should be noted also that single measurements are still in a sense 
complex, being comparable to volume, wattage or the opsonic index, 
rather than to length, weight or temperature. 

In the second place, the zeros of the scales for the educational 
measures and the equivalence of their units are only imperfectly 
laxown. As a consequence, we can add, subtract, multiply and di- 
vide educational quantities with much less surety and precision than 
is desirable. Indeed, in any given case, the sense in which one edu- 
cational product is twice as good or as desirable as another, or in 
which one task is twice as hard as another, or in which one improve- 
ment is twice as great as another, is likely to be a rather intricate and 
subtle matter, involving presuppositions which must be kept in 
mind in any inferences from the comparison. 
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In some cases so little is known of units of amount that we do 
not even try to equate distances along the scale, but simply express 
relative size in terms of arbitrarily chosen units and reference 
points.* This is the case, for example, with the most commonly 
used measurement in psychology and education, that due to apply- 
ing the Binet-Simon tests. 

Nobody need be disturbed at these unfavorable contrasts be- 
tween measurements of educational products and measurements 
of mass, density, velocity, temperature, quantity of electricity, and 
the like. The zero of temperature was located only a few years 
ago, and the equality of the units of the temperature-scale rests 
upon rather intricate and subtle presuppositions. At least, I ven- 
ture to assert that not one in four of, say, the judges of the supreme 
court, bishops of our churches, and governors of our states could 
tell clearly and adequately what these presuppositions are. Our 
measurements of educational products would not at present be en- 
tirely safe grounds on which to extol or condemn a system of teach- 
ing reading or arithmetic, but many of them are far superior to 
the measurements whereby our courts of law decide that one trade- 
mark is an infringement on another. 

There are two somewhat distinct groups of educational meas- 
urements : one, well illustrated by the Courtis tests, asks primarily 
how well a pupil performs a certain uniform task ; the other, well 
illustrated by the Hillegas or Trabue tests, asks primarily how hard 
a task a pupil can perform with substantial perfection, or with some 
other specified degree of success. The former are allied to the so- 
called method of average error of the psychologists ; the latter, to 
what used to be called the method of ‘‘right and wrong cases. 
Each of these groups of methods has its advantages, and each de- 
serves extension and refinement, though the latter seems to repre- 
sent the type which will prevail if education follows the course of 
development of the physical sciences. 

I have so far omitted specific reference to measurements by 
relative position — ^the so-called ‘order of merit method.^ This 
method, available even where no differences in the amount of the 

*That is, as a, a+x, etc., claiming only that x, y, and 

e are aU positive quantities. 
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thing measured are defined, and useful to organize the reports of 
untrained observers, is doing excellent service as a first stage in 
quantitative knowledge. For every reason, however, the grading 
of a set of educational products by relative position should soon 
give way to their rating by some even rough scale, such as meteor- 
ology uses for the cloudiness of days, and such as astronomers use 
for the magnitude of stars. A very, very simple form of scale, such 
as almost anybody can use to measure almost any product that he 
knows anything about, is that devised by Walter Dill Scott for use 
in rating the achievements or promise of employees.® 

The purpose of measurements of educational products is in 
general to provide somebody with the knowledge that he needs of the 
amount of some thing, difference or relation. The ‘‘somebody^' may 
be a scientific worker, a superintendent of schools, a teacher, a pa- 
rent or a pupil. He may need a very precise or only an approxi- 
mate measure, according to the magnitude of the difference which he 
has to determine. He may need it for guidance in many different 
sorts of decisions and actions. 

Some of the most notable uses concern the values of studios 
in terms of the changes produced by them, the effects of different 
methods of teaching, and the effects of various features of a school 
system, such as the salary scale, the length of the school day and 
year, the system of examining and promoting pupils, or the size of 
class. There are many problems under each of these heads, and 
each of these problems is multifarious according to the nature, age, 
home life and the like of the pupils, and according to the general 
constitution of the educational enterprise, some small feature of 
which is being studied. 

^-Another important group of uses concerns inventories of the 
achievements of certain total educational enterprises such as our 
educational surveys must become if they are to carry authority with 
scientific men. The total educational enterprise may be the work 

•The Rating Scale was originally devised and is now issued as one of the 
copyrighted forms of the Bureau of Salesmanship Research, Carnegie Institute 
of Technology, Pittsburgh, Pa. A modification of it is in use in certain sec- 
tions of the military organization of the United States for the selection of men 
for promotion. A description of this military modification has appeared in 
Collier WeeJclu — U. M. W. 
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of a teacher, of a school, of an orphanage, of a prison, of a system 
of schools, or the like. 

Another important group of uses centers around the problem 
of giving the individual pupil the information about his own 
achievement and improvement which he needs as a motive and a 
guide. It is interesting to note that the first of the newer educa- 
tional scales, which was expected to be used chiefiy by scientific in- 
vestigators of the teaching of handwriting, now hangs on the wall 
of thousands of classrooms as a means for pupils to measure them- 
selves. There are many other purposes, and important ones, such 
as the detection and removal of gross prejudices on the part of 
teachers in their own evaluations of certain educational aims and 
products. These, however, cannot be described here. 

The superintendents, supervisors, principals and teachers di- 
rectly in charge of educational affairs have been so appreciative of 
educational measurements and so sincere in their desire to have 
tests and scales devised which they can themselves apply, that the 
tendency at present is very strong to provide means of measurement 
which are concerned somewhat closely with school achievements, 
and which can be used by teachers and others with little technical 
training. There is also a tendency, because of this need for a large 
number of measurements in the case of educational problems, to 
try to devise tests which can be scored by persons utterly devoid of 
judgment concerning the products in question. 

It would ill become the present writer to protest against these 
two tendencies ; and they are intrinsically healthy. There is, how- 
ever, a real danger in sacrificing soundness of principle and pre- 
cision of result to the demand that we measure matters of import- 
ance and measure them without requiring elaborate technique or 
much time of the measurer. The danger is that the attention of 
investigators will be distracted from the problems of pure measure- 
ment for measurement's sake, which are a chief source of progress 
in measuring anything. Perhaps not even one person in a million 
need feel this passion, but for that one to cherish it and serve it 
is far more important than for him to devise a test which thousands 
of teachers will employ. Opposition, neglect, and misunderstand- 
ing will be much less disastrous to the work of quantitative science 
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in education than a vast output of mediocre tests for measuring 
this, that and the other school product, of which a large percent 
are fundamentally unsound. 

We have seen that educational measurements vary from an 
assignment of a certain amount of some clearly defined thing, the 
zero, or ‘‘just not any,’’ of which is fairly accurately known, to 
a mere assignment of a certain position in a series of products them- 
selves only similarly defined. They vary also from measurements 
in the most unimpeachable of units, such as time, to measurements 
where the unit is “that difference in quality which 75 percent of 
a certain sort of observers succeed in observing” or is even more 
crudely and hypothetically defined. They include measures in 
the form of how well a certain task is performed, and of how hard 
a task can be performed with a certain degree of success. Conse- 
quently, the methods of devising and using educational measure- 
ments also vary widely — too widely for any unified exposition. 
What will be said about methods here will, in fact, comprise only 
certain recommendations and cautions which are likely to be often 
appropriate. 

Consider first certain principles of method designed to ensure 
reliable measures, or at least measures whose degree of unreliability 
is known and can be allowed for. These are : 

At least two specimens or samples should be taken of any fact 
about which a statement is to be made. If any individual’s achieve- 
ment in drawing is to be reported, use at least two drawings. If 
the achievement of a class in addition is to be reported, use at least 
two tests, preferably on two days. If the effect of a method is to 
be estimated, test the method with at least two classes taught by 
different teachers. If the quality of a specimen of handwriting is 
to be reported, have at least two judges rate it independently. It 
will often appear from the comparison of two samplings of a fact 
that many more samplings are needed to permit a statement that is 
precise enough for the purpose in view. 

No fixed rules can be given, since the purpose in view deter- 
mines the degree of precision that is required, but it may be noted 
that a test which gives, for a single pupil, an approximation so 
rough as to be almost useless, gives for a class of thirty-six a result 
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wMch is six times as precise, and for a gronp of nine classes a result 
which is eighteen times as precise. Ten times as large a sampling 
of the product in question is required to measure a single pupil as 
to measure the average of a hundred pupils (to the same degree of 
precision). In general, eight tests of 15 minutes each are superior 
to four tests of 30 minutes each, and still more superior to two tests 
of 60 minutes each, since the accidents of particular temporary cir- 
cumstances are thus reduced in influence. 

Consider next certain principles of method that need to be ob- 
served if we are to secure measures whose signiflcance is certain. 

Great care should be taken in deciding anything about the fate 
of pupils, the value of methods, the achievement of school systems 
and the like from the scores made in a test, unless the signiflcance 
of the test has been determined from its correlations. For example, 
it cannot be taken for granted that a high score in checking letters 
or numbers is significant of a high degree of accuracy and thorough- 
ness in general. Letter-checking tests have been so used, but with 
very little justification. Courtis has given reason to believe that a 
test with stock problems from text-books in arithmetic may be a very 
inadequate test of ability to reason with quantitative facts and re- 
lations, this ability being in such a test complicated by, and perhaps 
even swamped by, the ability to understand the verbal description 
of the facts and relations. 

A pupil’s score in a test signifies first, such and such a particu- 
lar achievement, and second, only whatever has leen demonstrated 
by actual correlations to he implied by it. Nothing should be taken 
for granted. 

The significance of one ability (A) for another (B), is pven 
by the correlation coefficient Tab corrected for attenuation. The 
significance of a particular test sampling (A) for the ability (B) 
is given by the raw correlation coefficient Tab- Thus, arithmetical 
ability itself is significant to a high degree of promise of ability with 
algebra and geometry, but a five-minute test in arithmetic would 
be much less so. 
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It is unfortanatelj the case that we do not at present know 
at all well the significance of any school ability or of any of the 
tests which we have devised as convenient means of sampling abil- 
ities. We need not blame oursdves for this: the educational meas- 
urements now in use are much better than none at all. They do 
excellent service, provided inferences are made with proper caution. 
They will do stiU better service in proportion as the correlations 
of each are determined. This work is extremely laborious, but 
sound method requires it. 

Consider next certain principles of method designed to free 
measurements from certain pernicious disturbing factors, notably 
unfair preparation for the test, inequalities in interest and effort, 
and inequalities in understanding what the ta& is. 

The best protection against unfair preparation is the provision 
of many alternative tasks of demonstrated equality in difficulty. 
This again means extremely laborious and uninteresting work, 
which nevertheless requires expert talent. It should be subsidized. 

There is and can be no absolute assurance of equality in interest 
and effort. Any educational product is a product of ability condi- 
tioned by interest. All that we can do is to choose such conditions 
for the test as are found to reduce inequalities in interest and effort 
to a minimum (that is, to show high correlations with the composite 
of results obtained with a sampling of all conditions likely to infiu- 
ence interest). There is reason to believe that, whmi the test is 
taken as a part of school work, the appeal to group competition, as in 
“We wish to find out whether you can do as well as the sixth-grade 
children in Boston did,” and a promise to report the results to 
each individual, are useful. In the case of high-school and college 
students a small payment in money or release from tasks, together 
with the promise of a full report to each individual, seems a useful 
method. 

Inequalities in understanding what the task is, may be reduced 
by a preliminary trial, identical in form with the test itself, but 
with very easy content, and by giving special tuition to any pupil 
who falls in this preliminary trial. Instructions should be in simple 
language and should alwa,ys be accompanied by at least three con- 
crete samples of the task. 
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One who is eager to find imperfections can find many in present 
measurements of educational products. Nor is it a hard task to make 
constructive suggestions for improvement. An intelligent stu- 
dent of education could probably in a single day note a score of 
sure ways of improving the scales and tests which we now use. 
That is really child’s play. The hard thing is the actual expert 
work of remedsdng the imperfection, for this involves hundreds of 
hours of detailed expert planning, experimenting and computing. 
What is needed in educational measurement is not the utterance by 
onlookers of criticisms and suggestions with which the men actually 
at work with measurements are as familiar as they are with their 
own names, but expert assistance in overcoming the defect. 

If those who object to quantitative thinking in education will 
set themselves at work to understand it; if those who criticise its 
presuppositions and methods will do actual experimental work to 
improve its general lope and detailed procedure; if those who are 
now at work in devising and in using means of measurement will 
continue their work, the next decade will bring sure gains in both 
theory and practice. Of the gains made in the past decade, we may 
well be proud. 
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In preparing this chapter on the specific uses of measure- 
ment in the solution of school problems the writer mailed to a se- 
lected group of school superintendents, most of whom were known 
to have used tests and scales, a questionnaire. Chiefly it was sought 
to learn what changes in school organization and procedure had been 
made as a result of such measurement. 

Among 200 replies received there were 62 which reported some 
conscious alteration in the work of the school following the use of a 
standardized scale or test. In general, these changes may be 
grouped under six heads as follows : 

1. Changes in classification of pupils 

2. Changes in school organization 

3. Changes in course of study 

4. Changes in methods of instruction 

5. Changes in time devoted to subject 

6. Changes in methods of supervision 

Under these same heads it is convenient to group the remedial 
measures described in the periodical literature, and the discussion 
to follow will, therefore, make use of this classification. Under each 
head will be given results of the questionnaire. These results are 
fragmentary, but serve to indicate the range of things which school 
officers wisely or unwisely do as a result of information derived 
from educational measurements. Under each head will also be 
given one or more detailed examples of the type of remedial work in 
question. No attempt is here made to give an exhaustive review of 
the literature. Only suggestive cases are used. 
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I. Classification op Pupils 

1. Changes Indicated by Beplies to Questionnaire 

Changes in classification were of three sorts : 

(а) The promotion and demotion of individual pupils who 
were found improperly classified. ‘‘We gave certain pupils double 
promotions.^' “We demoted and promoted pupils who did not fit 
the grade they were in. " “ Bright pupils were put into rdief by the 
tests and afterwards examined; a large number were promoted 
thus." 

(б) A transfer of pupils from one grade to another for par- 
ticular subjects. ‘ ‘ Pupils were transferred in reading to grades for 
which the tests revealed they were fitted," “When pupils reached 
Quality 10, Thorndike handwriting scale, in monthly tests they were 
promoted into advanced section, meeting three times a week," “Pu- 
pils classified in reading according to score in test : (1) those below 
Kansas Standard drilled in thought interpretation; (2) those who 
equalled Kansas median given no extra attention; (3) those who 
tested a grade higher allowed to drop reading for a time and work 
on any study they were low in." 

(c) A general reclassification of entire school. “I classified 
my school below seventh grade so pupils could make up work where 
they were weak and take advanced work where they were strong." 

2. Detailed Example 

In the University of Minnesota High School 60 pupils were ac- 
cepted at the opening of the school year as beginners in the fresh- 
man class. These students were required to take English and gen- 
eral science and had a choice of two of the following: Latin, an- 
cient history, and mathematics. Before the first day of school they 
were tested, as a group, with the Trabue completion scale, an anal- 
ogies test, and an omnibus mental test. Later they were tested with 
Thorndike's reading scale Alpha 2, a series of arithmetical prob- 
lems, and a wide range of mental and educational tests. The in- 
itial tests were given in order to get a measure of the probable suc- 
cess of the several students and to find an intelligible basis for sec- 
tioning the group for purposes of instruction. On the basis of the 
test scores, the students were divided into two or more classes for 
the several school subjects. 
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The school marks for these students were given in all by seven 
instructors ; each student was rated by four different teachers. The 
marks were in letters A, B, C, D and F and were based upon a rel- 
ative marking system. In the total marks to be here considered A 
was given to 8.6 percent of the class, B to 21.2 percent, C to 43.3 
percent, Z> to 20 percent and F to 6.8 percent. In the work in gen- 
eral science, where two sections were taught by different teachers ; 
the instructors conferred in giving the marks, so that the sixty 
pupils were rated as a single group. This rating was in large meas- 
ure on the basis of objective tests which were the same for all. Sim- 
ilar objective tests were used in aU subjects. 

The marks to be here considered are those given at the end of 
the first month. These marks, more than those to be given later, 
are liable to inaccuracy as indicators of ability to do the high-school 
work. Now, comparing these marks^ with the scores in the standard 
tests, what do we find? 

The first and simplest measure of prophecy which we wish to 
find is the median attainment, for sixty pupils can be fairly divided 
into two sections for high-school instruction. If, by dividing them 
on the basis of the tests, we get the more capable half in the upper 
section and the less capable half in the lower section, the prophecy 
of the tests is confirmed. Figuring the median attainment, we find 
that the omnibus test placed 72 percent of the sixty pupils in that 
half of the class in which they were placed by the teachers^ marks 
for the first month. Likewise, the reading scale Alpha 2 placed 72 
percent correctly. When the scores of the two tests are combined 
the median retention is 76 percent. In other words, the tests placed 
more than seven pupils in every ten correctly before they had done 
a single day of high-school work. 

As a matter of fact, the prognosis was even better than this, for 
all of the instructors were agreed that the school marks did not rep- 
resent the abilities of all of the pupils accurately. Thus, one boy 
of acknowledged ability stood 12 in the omnibus test and 22 in the 
reading test. In marks he stood 57, scoring C in history and fail- 
ing in general science, mathematics and English. Every one, in- 

^These data are available througb the courtesy of Dr. W. S. Miller, Prin- 
cipal of the University of Minnesota High School. 
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eluding the boy himself, admits that through this month he idled 
away his time. There is also the opposite type of case — a student 
scored low in the tests but by unusual industry lifted himself above 
the median line in marks. When one considers the part which in- 
dustry plays and the variable factors of outside work, social dis- 
tractions, parental interests and personal attitudes in relation to 
school achievement, it is clear that intellectual ability alone is not 
determinative. 

It is encouraging to believe, however, that approximately eight 
children in ten are properly placed by such tests. This idea is 
supported by figuring the average mark of the students in the upper 
and lower halves of the tests. Equating the letters to the figures 
5, 4, 3, 2 and 1, it is evident that a student carrying four subjects 
could score 20 points. As a matter of fact the upper half as divided 
by reading test scored 13.4 points, while the lower half scored 10.3, 
and similar figures for the two groups as divided by the omnibus 
test are 13.5 and 10.2. If we divide the sixty pupils into three equal 
groups on the basis of the two tests, the upper third averages 14.2 
in marks, with no failures in a total of eighty marks. The lower 
third averages 9.8 with 17 failures in a total of eighty marks and 
only 9 marks above C. 

Eeasoning from all these data, it is safe to say that both edu- 
cational and mental tests may be used to classify students for pur- 
poses of instruction ; that a few hours spent in preliminary exam- 
ination will foreshadow later achievements to a high degree. Ad- 
mittedly, much technique remains to be developed, but the prom- 
ise is sure. Superior students can be detected and grouped to- 
gether; mediocre students can be put with mediocre students, and 
weak students, instead of being submerged in the struggle to main- 
tain standing, can receive the help they need. It would be difficult 
to overestimate the increase of efficiency that would come from the 
better adaptation of instruction in consequence of such classifica- 
tion. 

How dependable the tests really are in promoting superior 
students is evidenced by numerous instances, of which the following 
may serve as an example: 
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Robert was 12 years old, beginning second semester of eighth 
grade. His teachers reported him indifferent, doing only ordinary 
work and inclined to be the center of schoolroom disorder and or- 
ganized insurrection. Parents noted that, though previously much 
interested in school, the boy now disliked to attend ; he disliked the 
teachers and wanted to drop out. Robert insisted that the studies 
were not interesting, that he knew all he wanted to know about them 
already. Mental examination showed an intelligence quotient of 
142, a mental age probably greater than that of some of his teach- 
ers, who bored him to death by treating him as an ordinary twelve- 
year-old. He was recommended to high school, entered three weeks 
late, led his class at the end of six weeks and at every subsequent 
interval when marks were given. More important, his whole atti- 
tude toward school was changed, because the advanced work was a 
real challenge to his mental ability. 

II. School Organization 
1. Changes Indicated by Replies to Questionnaire 

Seven types of change in school organization were indicated 
as follows : 

(а) Change in size of classes. ‘^Smaller classes in arithme- 
tic.’’ ^‘More teachers in arithmetic.” “The services of additional 
teachers demanded for defectives in industrial training.” “En- 
largement of class for defectives.” 

(б) Division of classes into special sections. “An advanced 
and a special section made in writing on the basis of errors.” 

(c) Organization of Special Classes. “Opened special room 
for backward pupils.” “Organized special corrective work.” 

(d) Departmentalization, “Departmentalization of sixth, 
seventh, and eight grades.” “Department teaching.” 

(e) Arrangement of parallel programs. “Spelling periods 
for different grades arranged for same time to permit transfer of 
brighter pupils.” “Same for reading.” 

(/) Appointment of supervisors and supervising principal. 
“Position of supervising principal for primary grades created.” 
“Appointed a director for a newly organized Bureau of Research.” 
“Plan to hire a trained supervisor for writing next year.” 

(g) Inauguration of special schools. “An initial attempt 
to develop an elementary industrial school for pupils shown by the 
tests to be unfitted for the regular work.” 
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3. Detailed Examples 

(a) Parallel Programs. At the Lake Harriet School in Min- 
neapolis a measurement by the Ayres scale showed a wide distribu- 
tion of spelling attainment among the pupils of each class. In the 
8A grade, 43.3 percent of the children were of eighth-grade spelling 
ability. Grouped along with them were 26.7 percent of seventh- 
grade ability, 26.7 percent of sixth-grade ability and 3.3 percent of 
fourth-grade ability. More variable than this was the fifth-grade 
class, where the distribution showed every level of ability from the 
second to the eighth grade. On the basis of this showing. Miss 
Probst, the principal, observed that it was practically impossible in 
group instruction ‘‘to devise a spelling lesson which would tax the 
capacity of each individual in the group.'' It was determined, 
therefore, to rearrange the grouping in such manner as to give every 
child “capacity work." To justify this regrading, further tests were 
given. With the results of all the tests as a basis, the pupils were 
redistributed so that those of like ability recited together. In the 
new organization the original 7B class retained four pupils from 
an original total of thirty-two. The others were distributed to new 
groups as follows : seven to 8A, six to 8B, thirteen to 7A and four 
to the sixth grade. To the new 8B group, the several grades contrib- 
uted in this fashion : 8A gave 7 pupils ; 8B gave 14 ; 7A gave 4 ; 7B 
gave 6; the sixth grade gave 6; the fifth grade gave 5, and the 
fourth grade one — a total of 42. 

When the redistribution of the pupils on the basis of ability 
had been made, the spelling recitations for all were arranged to take 
place between 11:45 and 12:00 o’clock on each day. At the sound 
of the bell, each pupil passed to the room where his particular level 
of work was in progress. There he would go each day until a defin- 
ite change in his work occurred. If the average of his work for two 
consecutive weeks should fall below 90 percent, he automatically 
gravitated to the grade below. If he maintained a standard of 99 
percent for four successive weeks, he was promoted to the grade 
above. To illustrate : two 8B boys whose beginning record showed 
about 65 percent dropped back to the fifth grade. Unable to main- 
tain the pace of this grade, they dropped back to the fourth. 
“Here," to quote Miss Probst, “they evidently found their spell- 
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ing level, for they took a fresh start, were promoted to the fifth 
grade, then to the sixth and finally to the 7B group/* 

The chief outstanding result of the experiment was a very great 
diminution of the variability in the several grades. Whereas in 
the first test but 43.3 percent of the 8A pupils had eighth-grade 
spelling ability or better, 71.4 percent came up to this standard in 
the second test in May. In the 7A class, the corresponding figures 
were 45 and 80 : in the sixth grade they were 55 and 90. 

(6) Special Classes in Schools. A second change in school 
organization brought about by means of educational and mental 
tests is the institution of special classes for the education of excep- 
tional children. Wallin reports (1914) eleven types of such classes. 

(c) Organization of Bureaus of Research. Probably the most 
significant change in school organization growing out of the meas- 
urement movement is the organization of Bureaus of Research as 
a supplementary supervisory and administrative agency. Since 
this topic is treated at length in other chapters of this report an ex- 
tended statement is unnecessary at this place. 

III. Course of Study 

1. Changes Indicated by Replies to Questionnaire 

It was difficult often to differentiate changes in the course of 
study from changes in methods of instruction. Under this head 
four types of change were more frequently noted than any others. 

(а) Change of Textbooks. In some cases the textbook was 
merely changed for another book. In other cases, the book was ap- 
parently dropped altogether and other types of material substi- 
tuted. 

(б) Emphasis was changed by giving more space in the 
course to different parts of a subject. This seemed to be especially 
true in arithmetic and in spelling. 

(c) Numerous replies indicated that the tests served to fix 
standards of achievement for different grades. 

(d) The organization of specialized curricula for special 
classes. 
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2. Detailed Examples 

(a) Specific Aims to he Achieved. The effect of educational 
measurements on the course of study has been to fix specific aims 
in the several school subjects. With most of the standard scales 
and tests there have been proposed ideal forms for the several 
grades, or the average and median scores made by these grades have 
been set as desirable ends to be achieved. 

The specific aims to be accomplished may be grouped under 
three heads : aims in rate of work ; aims in quality of work ; and a 
combination of the two, under the head of efficiency. 

The accompanying table contains a representative set of such 
aims. The full meaning of these standard scores will be under- 
stood only in connection with the tests themselves. 

SAMPLES OP SPECIFIC AIMS 


ACTIVITT 

MEASUBED BY 

SCOBB 

Hate of Silent Beading 

Gray (Ancient Ships) 

2.87 words per second 

Quality of Silent Beading Thorndike Scale for 

Understanding of Sen- 
fences 

7.50 on Scale Alpha 2 


Word Knowledge 

8.50 approximately on 
Scale A, A2 or B 

Addition 

Courtis Series B 

Bate 12 Accuracy 100 
percent 

Addition 

Woody Series A 

Accuracy 9.01 scale 
points 

Seasoning 

Stone 

Bate 8.75 Accuracy 90 
percent 

Spelling 

Ayres scale 

100 to 50 percent on Col- 
umns N to Z 

Writing 

Ayres 

Bate, 79 letters per sec- 
ond 

Quality, 62 Gettysburg 
scale 

Language 

Trabne 

14 to 16 scale points 

Composition 

Harvard-Newton 

70 on scale 

Grammar 

Buckingham 

5.13 questions answered 

Geography 

Hahn-Lackey 

0 to 99 percent on 20 sets 


of questions 
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Any pupil who at the middle of his eighth school year can 
achieve the above standards should be considered normal for the 
grade. A lower attainment clearly indicates the need of remedial 
work for the class or individual in question. A distinctly superior 
achievement is evidence of superior intelligence on the part of the 
pupils or of superior methods of instruction. 

(6) Minimal Essentials. A second important movement mak- 
ing use of measuring methods is the effort to derive minimal essen- 
tials. Of the nine methods of determining the minimal content of 
the course of study described by Coffman, at least four are essen- 
tially methods of measuring the acquired behavior of children and 
adults ; two are measurements of the content of reading matter, and 
the other three are concerned with pooled opinions. Of the ten 
chapters in the Sixteenth Yearbook of this Society, ‘‘Minimal Es- 
sentials in Elementary School Subjects,’’ five make direct use of 
information derived from the measurement of children’s attain- 
ments. The other five are based on measurements of the content 
of books and other published material. 

It is unnecessary to enter here into a detailed statement of these 
investigations. It is sujfficient to note that the determination of the 
content of the course of study must follow two fundamental prin- 
ciples suggested by the following questions: (1) What should 
children know and do as children and adults? (2) What can 
children at any age learn with profit ? The final answer to both of 
these questions must be obtained by measurement. 

IV. Methods of Instruction 

1. Eetums from Questionnaire 
(a) Increased Emphasis. Approximately one-third of the 
replies note “increased emphasis,” which apparently means more 
time given to a subject or more value placed upon efficiency in it. 
The following are representative of these replies : 

“More emphasis, all grades, on meanings of words and sen- 
tences.” “Stressing legibility in writing.” “Greater emphasis 
on fundamentals in arithmetic.” “More emphasis on correct use 
of words in reading, less on definition.” “Special emphasis given 
to those subjects where standard was low.” “More drill in all 
grades. ” “ More silent reading. ’ ’ 
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(6) Drill. The favorite recourse for improving attainments 
which the tests show to be poor is drill/’ This term is vague, 
often meaning merely more time devoted to repetition of certain 
processes. 

‘‘Arithmetic: All teachers required to give more drill work.” 
“Special emphasis given to those subjects where standard was low.” 
“Five-minute daily drill on fundamentals in arithmetic.” “More 
time and attention given to drill in number combinations.” “Dic- 
tation drills with attention on punctuation. ” “ More intensive drill 
in grammar and punctuation.” “More drill in spelling.” 

(c) Specialized Drill. Some correspondents report a partic- 
ular type of drill, involving a definite change in the details of the 
drill process as well as change of time 

“Courtis drill cards in arithmetic in two rooms; more oral 
drill in all.” “Horace Mann method of spelling adopted.” “In- 
stalled Palmer method of writing.” 

(d) New Devices. Another method of improvement of in- 
struction was the invention of special devices. 

“Made room charts showing individual’s work (median, quar- 
tile, safety zone) . ” “ Teachers used questions similar to Kelley test, 
and applied to geography and other subjects.” “More instruction 
through interest of pupil.” “Supervised study periods — all 
grades. ” “ Three periods of supervised study added to school day : 
those who failed in one subject required to stay 1 period ; in two, 2 ; 
etc.” “Greater use of dictionary for meaning of words.” “Tests 
devised to watch pupils’ progress.” 

(e) Individualized Teaching. The measurements serve to fix 
attention on individual differences among children and to further 
the adaptation of instruction to individual needs. Illustrations: 

“Individual attention; specifics devised for securing appreci- 
ation of good writing.” “Individual help given slow pupils.” 
“Methods adapted to ability of pupils.” “Backward pupils dis- 
covered and given special attention.” “Promotion more by sub- 
jects.” 

2. Detailed Examples 

(a) Methods of Drill in Arithmetic. Mead, at the request of 
Superintendent Condon of the Cincinnati schools, undertook the 
experimental evaluation of two kinds of practice material in the 
fundamentals of arithmetic. The materials in question were the 
Courtis Standard Practice Tests and the Thompson Minimum Es- 
sentials in the four fundamentals. About 900 fifth-grade pupils 
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from fourteen schools were divided into two approximately equal 
groups and were drilled 15 minutes daily from February to May. 
The efficacy of the two kinds of practice material was determined by 
preliminary and final standard tests with each group ’’ (Courtis 
Series B) . The initial and final tests, as well as the intervening prac- 
tical exercises were given by the class teachers under standardized 
instructions, and care was taken to keep conditions constant and fa- 
vorable. 

The results of this carefully controlled experiment show that 
in speed of work the two kinds of practice material produce im- 
provement, and essentially the same amount of improvement. Both 
also show gains in accuracy of work, but they differ essentially in 
the amount of improvement in accuracy resulting from the exer- 
cises. The pupils of the twelve classes using the Thompson Mini- 
mum Essentials show median gains in percent of accuracy for the 
four fundamentals of 2.5, 4.0, 2.9, and 15.7 ; similar figures for the 
Courtis Practice Tests were 9.7, 8.1, 8.9, and 18.2. Clearly, the 
Courtis Practice Tests are superior, and the test shows in what re- 
spect they are superior. 

It does not follow from this that the Thompson Minimum Es- 
sentials are of no value or that in another grade and under other 
conditions they might not prove superior. What is fairly certain 
is that, upon the use of the two methods in fifth-grade classes under 
school conditions prevailing in Cincinnati schools from February to 
May, 1916, and described by Mead, the debate has closed. 

(6) Teaching of Handwriting, The “two-squad’’ method 
has been used by Mr. A. G. Capps in measuring the effect of “diag- 
nosis and corrective measures in the teaching of handwriting.’^ 
The handwriting of 44 sixth-grade children was measured on Octo- 
ber 9. All who scored high on the Thorndike scale. Quality 9 or 
more, and whose papers were relatively free from common errors, 
were put into an “advanced class.” All others were put into a 
“special class.” The advanced class was taught three days per 
week ; the special class, five days per week. A pupil was allowed to 
pass from the “special” to the “advanced” dass, when on the 
monthly tests he scored Quality 10 (Thorndike). 
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All pupils were treated by the same general method, which was 
as follows: The handwriting of each pupil was diagnosed follow- 
ing Freeman’s analysis of “quality” into alignment, spacing, 
slant, quality of line, and letter form. On the basis of this diag- 
nosis, remedial treatment was prescribed for each class and for each 
individual, and detailed methods of instruction were worked out. 
The pupils were made acquainted with their own difficulties and 
taught to practice with a view to achieving certain detailed aims, 
such as better form for the letter a, improved alignment, etc. The 
experiment was continued for five months, and standard tests were 
given at four- week intervals. 

The significant results of the experiment may be stated as fol- 
lows: (1) The advanced class improved in quality from 9.06, or 
less than fifth-grade attainment, to 10.47, or slightly more than 
seventh-grade scores (60 minutes per week) . (2) The special class 
improved from 8.74, or fourth-grade score, to 9.68, or slightly less 
than sixth-grade average (100 minutes per week). In both classes 
there was a substantial gain in speed. 

In the light of these results, regarded as tentative by the ex- 
perimenter, there can be little doubt this teaching method has con- 
siderable claim to efficiency. 

(c) Improvement in Written Composition. Somewhat differ- 
ent in type is the study by Brown and Haggerty of the improve- 
ments in English composition. The weekly composition exercises of 
three high-school classes through a period of twelve weeks were 
measured by the Harvard-Newton Scale. By this method it was pos- 
sible to secure a “learning curve” for individuals as well as classes, 
and thus to see the educational behavior of a student somewhat 
more intimately than when only initial and final measurements are 
made. 

Prom this experiment several facts stand out. (1) There is 
no essential difference in composition performance among classes 
rated as first-semester freshmen, second-semester freshmen, and 
first-semester sophomores. (2) In general, classes gain in power 
to write during a twelve-weeks period about four or five Harvard- 
Newton Seale points. (3) Some students, relatively poor ones, 
gain much more than the class average, in some cases as much as 
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twenty scale points. (4) Other students, often the superior indi- 
viduals, make little or no improvement ; some individuals do more 
poorly at the end of the twelve-weeks period that at the beginning. 
(5) Some composition topics elicit better products than others, as 
shown by the median scores of all classes. 

This simple experiment does not get very far in solving the man- 
ifold perplexing problems of composition teaching. The method, 
however, offers a means of further investigation. 

(7'. Time DiSTRiBunoN 

The changes in time were mostly of the nature of increased 
time to subjects where the tests showed the product to be low grade. 
The following are characteristic replies : 

‘‘Lengthened periods for subjects where deficiency was great- 
est. ” “ More time given to arithmetic. ’ ^ “ More time given to writ- 
ing in all grades.’’ “Ten minutes of arithmetic recitation period 
used for drill three days per week.” “Increased number of hours 
for industrial work.” 

Only one correspondent reported a diminution of time, say- 
ing, “We excused pupils and grades from work ‘over done’.” 
Numerous studies, from that of Dr. Eice on, have shown that results 
of school work are not directly correlated with the time spent on the 
subject. Studies in handwriting, spelling and arithmetic show that 
the maximal time may yield poorer returns than a smaller amount 
of time.2 


VI. Changes in Supervision 

As a rule, tests have been introduced into the schools through 
the supervisory and administrative officers. It is not surprising, 
therefore, to find that the results of the tests have had a direct and 
considerable effect on the detailed work of such persons. 

The general supervisory practice is to report the results of tests 
to principals and teachers. Such a report gives not only the scores 
of the class or classes concerned but also comparable norms from 
other classes within the system and from other systems. Appar- 
ently some supervisors end their use of the tests with this report. 

*Some concrete studies in this field were reviewed by the writer in School 
and Society, November 19, 1916. 
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A second step taken by other supervisors is a conference be- 
tween the teacher and supervisor, a conference in which “the opin- 
ion of the supervisor is not pitted against that of the teacher, but 
in which the attention of both is directed to the definite and object- 
ive result of the test and to the causes producing such result/' 
Growing out of such conferences come “new outlines of work," 
“re-organization of programs," “special lessons for teachers in 
teaching spelling," “requirements in the handwriting of teachers," 
in fact, any one of the changes hitherto enumerated. 

A third step to be noted is a further examination of a class or 
pupil for the extension of the diagnosis. This is made with other 
tests and through personal investigation. 

A fourth step taken by some supervisors and teachers is a 
second measurement, after a period of remedial work, to test the 
efficacy of the changed program. In about one-fifth of the cases 
where remedial work was reported, the supervisor had used the 
tests in this way. The following are typical replies : 

In arithmetic “Graph showed greater improvement in grade in 
one month, than first graph showed from class to class (five months' 
work)." “Have used drill and are now above the Courtis Stand- 
ard." “The weaker pupils do better work. More pupils brought 
up to required standard. Pupils more accurate." “Plateau dis- 
appeared which had existed from Grade VI to VIII ; curve for both 
attempts and rights in all operations shows gradual development 
to Grade VIII." 

In reading : ‘ ‘ Improvement in median score from September 7 
to January 23 ; Grade VII, 5.3, Grade VIII, 5.6." 

In spelling: “Pupils average a grade higher." “More un- 
iformity of grades and children know where they stand." 

In writing: “Better quality. Both quality and speed nearer 
average for grade." 

A fifth important use of tests by supervisors is to keep track 
of the normal progress of students and classes. Besults of such tests 
serve as a sort of weather map of the school system and show the pre- 
vailing winds. They enable the supervisor to know in an intimate 
way the entire system and to direct his efforts at supervision where 
they are most needed. Where fair progress is being made he can 
give little attention and save his usually inadequate energy for the 
places where it is most needed. 
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Finally, it is apparent that supervisors are making the class 
attainment, as shown by the results of measurement, one factor in 
the rating of teachers and for the further professional training of 
teachers. The following replies show this : 

‘ ‘ Introduced a system of teacher training. ^ ^ ‘ ‘ Certain teachers 
required to go to summer-school.’’ ‘‘Teachers required to visit 
classes with strong teachers.” “In second semester, changed 
teacher whose class was weak to a lower grade class, to test power 
and presentation of subject.” “Certain changes in both principals 
and teachers, and will require certain teachers to attend summer- 
school.” 

The experience of one supervisor is related in a most interest- 
ing paper on the Supervisor’s Use of Standard Tests, by J. C. Morri- 
son, who presents the details of a year’s work in Chatham, Mass., 
and sums up the work in these words : 

“Standard tests have proved an effective means in super- 
vision. Through their use teachers are improving the technique of 
their method and exercising a nicer judgment in the relation of sub- 
ject matter. Pupils are working to exceed their own record. Teach- 
ers and older pupils are coming to understand the scientific idea of 
education as it applies to the ordinary classroom. In no other way 
could the principal gain so close a knowledge of each individui 
child in the school. This knowledge is serviceable in placing new 
pupils, in determining promotions, in selecting accelerates and de- 
fectives, in searching out the special diflSculties of the individual, 
and in gaining the cooperation of the parent. The results of the 
tests have proved of interest to the public. On the basis of these 
results the board of education has employed a teacher to give ap- 
proximately one third of her time to the testing and supervision of 
work with special pupils. The school has made a start in the study 
of its children and will eliminate a large part of the wasted time 
and effort that results from the choice of the wrong high-school 
course.” 

In this survey of remedial measures based on the use of stan- 
dard tests one observes that many of the changes made are the time- 
honored ones which school officers have traditionally made on the 
basis of personal opinion and in response to changing ideas. Ap- 
parently what the tests do in such cases, is to render definite the 
arguments for these changes and to make accurate the evaluation 
of the efficiency of remedial measures when once they have been car- 
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ried out. To a schoolman who cares to guide his practice by facts 
rather than by debate, this service sufficiently justifies the idea 
of measurement. To argue the case with other types of individuals 
is, perhaps, a waste of time. 

One thing the tests show, however, which traditional practice 
has never recognized, because it never knew it, and that is, the 
enormous range of individual differences among children, both in 
ability and in attainment. These differences the tests reveal in a 
way that must inevitably alter profoundly our whole program of 
education. 



CHAPTER IV 

GENEEAL ORGANIZATION OF EDUCATIONAL MEASURE- 
MENT WORK IN CITY SCHOOL SYSTEMS 


Fkank W. Ballou 

Director, Department Educational Investigation and Measurement, 
Boston, Massachusetts 


Every change which takes place in educational practice is pre- 
ceded by a period of agitation. The profession must be made aware 
that a given condition is unsatisfactory and must be convinced that 
the proposed change will bring about improvement. In addition, 
the lay public must be educated to understand the meaning and 
significance of the proposed change. Many educational schemes 
never proceed beyond this stage of professional and public agita- 
tion. 

Following the period of agitation comes a period of trial and 
experimentation, in which the proposed change is subjected to close 
scrutiny. Generally, both the profession and the public produce 
searching and frequently unjust critics. These criticisms usually 
prevent even a trial of any changes in our educational practice 
which do not give promise of a reasonable degree of success. 

In those cities where special research departments have been 
organized, educational measurement is established beyond the stage 
of argument or debate. There the movement has passed success- 
fully through the periods of agitation and of experimentation. Even 
through the country generally, the movement has received such gen- 
eral recognition and endorsement during the past few years that 
progressive school systems do not now need to be provided with 
arguments in favor of educational measurement. This chapter, 
therefore, is not an argument for the introduction of educational 
measurement : it is not a discussion of the debatable phases of the 
subject ; neither is it a delineation of the educational advantages to 
be secured by measurement. Rather, it is a description of ways 
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and means of introducing systematic educational measurement in 
a city school system and carrying it on successfully. The methods 
suggested are those which appear to have succeeded best in those 
city school systems that have undertaken to measure educational 
results in an organized way. 

The time for the introduction of educational measurement into 
a school system should be wisely chosen; standard tests should be 
given only after the way for them has been carefully prepared. The 
success of any educational reform depends on the intelligent co- 
operation of the members of the educational profession. This is 
especially true of educational measurement, because it involves an 
entirely new method of attacking educational problems and an al- 
tered attitude on the part of teachers toward the whole educative 
process. Success is contingent on a thorough understanding of the 
aims, methods, possibilities, limitations, and achieved results of 
the use of standard tests and scales. Unless and until the profes- 
sion is so informed, and as a result is prepared to cooperate effect- 
ively in its administration, the possible values of educational meas- 
urement are not likely to be secured. Proper instruction through 
lectures, talks, and teachers' meetings will do much toward pre- 
paring the way for success in carrying on such work. 

Educational measurement involves selection of the tests to be 
used, testing the children, marking or scoring the papers, tabulat- 
ing and interpreting the results, and utilizing the conclusions as 
a basis for securing improvements in teaching when the outcome 
proves that the present teaching is unsatisfactory. This chapter 
deals with each of these topics. 

A. Selecting the Tests 

The attitude of teachers and others towards educational meas- 
urement depends largely on first impressions. The importance of 
making favorable first impressions, therefore, cannot be over-em- 
phasized. The first standard test selected to be given In a school 
system should be one which is most likely to be favorably received 
by teachers. Fairly satisfactory tests for several subjects or phases 
of subjects are now to be had.* 


’See Chapter YU for a list of available standard tests and scales. 
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Adequate standard tests for any school system should possess 
the following characteristics : they should measure educational pro- 
ducts obviously within the scope of the course of study of the city ; 
they should aim to measure those subjects or phases of subjects 
which are clearly measurable; they should be reasonably simple; 
they should be accompanied by adequate instructions as to how they 
are to be given and how the results are to be scored ; they should be 
scored and tabulated with reasonable ease ; they should already have 
been given to a sufficient number of children so that well-founded 
standards of achievement have been established. 

The demand for standard tests in various subjects has been so 
great that some questionable tests have been put on the market 
For example: a so-called ^standard test in spelling’ is available 
which tests the ability of children in Grades V to VIII to spell such 
words as the following : nunciature, sphericity, hoggery, senescent, 
symmetrize, incremental, rigmarole, verisimilitude, anthropometric, 
tubule, erosible, and divestiture. These words are not to be found in 
the course of study in most school systems in the United States. 
Lack of ability to spell such words cannot be charged against the 
schools, because the schools have not undertaken to teach children 
to spell them. This test and others like it that do not measure 
classroom instruction, should be avoided — at least in the beginning 
of testing work. If they are used, the purpose in giving them and 
their limitations should be distinctly understood. There are satis- 
factory standard tests in spelling which are obviously within the 
scope of classroom instruction. 

Criticism has been urged against educational measurement on 
the general ground that important products of good teaching in cer- 
tain phases of all subjects and in all phases of certain subjects can- 
not be quantitatively measured. Whether this is so or not may be 
debatable: let the debate go on. It is not necessary, however, to 
postpone all educational measurement until the debate is settled. 
In the meantime, giving standard tests in measurable subjects or in 
measurable phases of subjects will be profitable for teachers and 
pupils, and may also furnish valuable information by which to 
determine to what extent the results of educational practice gen- 
erally are measurable. 
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In introducing educational measurement into a school system 
for the first time, the more simple the tests are, the better. To be usa- 
ble, standard tests must be accompanied by adequate instructions 
for giving them and for marking the papers. Unfortunately there is 
no royal road to travel in carrying on educational measurement. 
The details of giving the tests must be mastered by those who would 
make proper use of them. If tests are worth giving at all, they 
should be given according to the systematic plan which the author 
of the tests has worked out. Success in introducing testing into a 
school system will be more certain if the tests are selected only after 
careful consideration of the instructions for giving and scoring 
them. 

It is also important that well-recognized tests be used in the 
initial testing in a school system. Not everyone who desires to give 
standard tests has the time, resources, or qualifications for prepar- 
ing his own tests. It cannot be too urgently recommended that at 
first tests should be selected from those already available. Later, 
one may successfully experiment with the preparation of the tests. 
Among the gravest dangers which educational measurement faces 
today is that which arises from amateurish attempts at the con- 
struction of standard tests by those who do not realize the need for 
a careful testing of the tests themselves before they are published 
for general use. 

Well-recognized tests are urged for use; first, because such 
tests are undoubtedly superior to those which would be prepared 
under most circumstances by a beginner ; second, and more partic- 
ularly, because the results achieved in any school system can thereby 
be compared with similar results from other school systems. 

One of the desirable outcomes of the giving of standard tests 
is the establishment of standards of achievement. Such standards 
furnish measures with which one may compare his own results. 
Standards are adequate, however, only when they are based on a 
large number of results. Tests which have been standardized from 
the results achieved by a small group of pupils in one or two school 
systems are to be avoided unless there are other quite special reasons 
for using them. 
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Testing should always be purposeful. Testing children merely 
for the sake of giving tests cannot be too thoroughly condemned. 
The following principle of practice is commended: never give a 
standard test unless you have a definite purpose related to the im- 
provement of instruction and unless you are prepared to tabulate, 
interpret, and iLse the results at once. 

Since the ultimate purpose of all educational measurement is 
the improvement of the instruction, it also becomes of paramount 
importance that the results be made known to the teachers and 
others before their interest in the tests has waned. 

B. Getting Tests Given 

Let us repeat that if the results obtained from giving standard 
tests are to be worth the time, money, and effort expended, the tests 
must be properly given. This means not only that they must be given 
in the same manner in every classroom throughout a school system, 
but it means, also, that they must be given according to the direc- 
tions that accompany the tests. If this is not done, the results 
cannot be compared with the standardized achievements in other 
cities — a fact which renders the interpretation of the results very 
difficult, if not impossible. 

It may be assumed that the author of the tests had reasons 
for adopting the procedure indicated by his instructions, and that 
the most effective use cannot be made of the results obtained in any 
school system unless those directions are followed. Even though 
one does not agree with the directions for conducting the tests, 
one would better not use the tests at all than to deviate in any im- 
portant respect from the author's directions for giving them. 

Tests may be given either by principals, supervisors, or teach- 
ers, or by persons especially trained for the purpose. Such tests 
as spelling tests, which are among the simplest to give, may be 
given satisfactorily by principals or teachers. They do not require 
careful timing, and the directions for giving are comparatively sim- 
ple and are easily followed. With a minimum of instruction any 
principal or teacher can successfully conduct spelling tests. 

Such tests as the Courtis standard tests in arithmetic offer a 
more difficult problem. They must be accurately timed and the 
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conditions under wMch they are given must be controlled. Those 
who give such tests must be specifically trained for the purpose. 

For this purpose one of two methods may be followed : a group 
of teachers or principals in the service may be instructed, outside 
of school hours, and assigned to give these tests in the various 
classes throughout a city. Or, as in Boston and other cities, those 
preparing to become teachers may be trained to give these tests 
as a part of their preparation for teaching. In Boston each mem- 
ber of the senior class in the Normal School spends one month in 
the Department of Educational Investigation and Measurement and 
receives instruction in the meaning of educational measurement 
and the training necessary for giving such tests as the department 
desires to have given. Following this training, these seniors are as- 
signed to give the tests in the various schools of the city.^ Very 
similar arrangements are made in many other places. Cities in 
which a college or university is located have also used trained col- 
lege students of education as special examiners. 

Experience has not yet determined which practice is likely to 
be followed in the future. It is certain that whoever gives tests 
must receive proper instruction in the methods of giving them. The 
policy of training normal-school seniors in measurement work is 
based on the assumption that the teaching staff of a city ought to 
be competent to give such tests as are needed for the measurement 
of the work of a school system. If the teachers of a city can be so 
trained, this is undoubtedly the cheapest and most effective method 
of solving the problem. If, for one reason or another, teachers do 
not prove satisfactory as examiners, then it necessarily follows that 
competent specialists must be employed for giving the tests, as well 
as for canying on other phases of educational measurement. What- 
ever be the method finally adopted, those who give tests must not 
only be properly instructed, but their work must also be adequately 
supervised. This supervision is the function of a department of 
educational research. In the absence of such a department in a 
school system, the superintendent’s office should assume this re- 
sponsibility. 

One of the difficulties in relying on the teaching staff as a 
whole to give standard tests grows out of the attitude of teachers 

*See article by the writer in School and Society, Vol. V. pp. 61-70. 
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toward such tests. In their attitude toward educational measure- 
ment one may expect to find three distinct classes of teachers in 
every school system: (1) those who endorse it, (2) those who are 
indifferent to it, (3) those who oppose it. On this account it is prob- 
ably wise to introduce standard tests into a school system on a vol- 
untary basis ; that is, to give the tests only in those schools or classes 
where principals and teachers are willing to have them given. 

The success of educational measurement in a school system 
depends on doing a small amount of testing and doing it well, rather 
than on doing a large amount. Quality rather than quantity is more 
likely to win favor. Opposition to educational measurement can 
best be disarmed by showing the improvement secured from the re- 
sults of standard tests. 

C. Scoring the Papers 

The amount of time involved in giving tests is small compared 
with the time necessary for correcting the papers. Assuming that 
the teachers give the tests, who shall mark them! The answer to 
this question depends largely on the character of the tests. Cer- 
tain types of papers from some tests can be satisfactorily marked 
by the pupils under the supervision of the teacher. For example : 
if a spelling test is given by the teacher, she may at once spell the 
words aloud and have the children mark the errors, following the 
common practice. The pupils’ scoring in every case, however, 
should be properly checked by the teacher. Likewise, in some tests 
in arithmetic, children may be provided with answer cards and 
shown how to correct the papers. 

The scoring of many tests, however, involves the exercise of 
some judgment, and pupils cannot be depended on to score such 
tests. In most cases, papers in geography and history should not be 
scored by pupils. Further, if one desires to give credit for a correct 
method in problem work in arithmetic, even though the answer is 
incorrect, the judgment of the teacher is undoubtedly necessary. 

Teachers should not be burdened with an unreasonable amount 
of work in the giving of standard tests. A satisfactory prin- 
ciple to follow may be stated as follows: teachers should be expected 
to correct test papers and tabulate results only in so far as this work 
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will increase the teacher knowledge of the abilities of pupils in her 
class. Putting this principle into practice would mean that in most 
cases the teacher may be expected to mark the individual papers 
of pupils in her class. She may be expected, also, to make class 
tabulations or summaries of individual achievements, because only 
by so doing can the teacher learn where her class stands in relation 
to other classes in the school, or in relation to the general standard 
of achievement established for the test. Beyond class summaries 
the work is likely to be too largely clerical to be of direct value to 
the teacher, and should be carried on by others. 

In marking papers and making class tabulations, teachers 
should have some one to whom to look for instruction and guidance. 
If there is no regularly organized department to supervise and di- 
rect educational measurement, some qualified person in the system 
should be assigned to do it. Through teachers^ meetings held after 
school, for which classes may be dismissed a half hour or an hour 
early and through instructions issued from time to time, the 
supervisor of educational measurement can materially lighten the 
burden which otherwise falls on teachers in this type of work. 

D. Tabulation and Interpreting the Results 

In Boston the Department of Educational Measurement has 
found an effective and economical way of making grade summaries, 
school summaries, and city- wide summaries of testing results. Girls 
from the Boston Clerical School are assigned to this work. These 
girls are being trained specifically for clerical work and on gradua- 
tion from school may take positions which require skill in just the 
kind of work involved in making tabulations from tests. The Depart- 
ment asks the Clerical School to send relays of eight or ten girls as 
long as the work lasts. Each group of girls reports for three or four 
days to the office of the Department where the tabulating is to be 
done. The Department instructs them in the special methods of tab- 
ulation, and has the work so systematized that a definite record is 
kept of the speed, accuracy and effectiveness of the work of each 
girl. Their attendance is also kept and reports are made to the head 
master of the school when the girls return. The teachers in the Cler- 
ical School consider this work an important part of the practice and 
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training which the school desires to give. The Department has found 
it a satisfactory method of getting the work done. The expense to the 
city involves only the car-fare of the students to and from their 
homes to the office of the Department. Much the same method has 
been followed in other cities that have commercial departments in 
their high schools. It affords a practical solution of what is often 
a critical problem in measurement work. 

If teachers or prospective teachers are trained to give the tests, 
and to score the papers of their respective classes, and if commercial- 
high-school pupils or normal-school pupils are used to make grade 
summaries, school summaries, and city-wide summaries, little or 
no direct expense is involved in that part of the work. But even 
though members of the teaching staff or pupils in the school system 
are thus employed, many additional tabulations will be desirable 
and necessary if the results are to serve their greatest usefulness. 
For such work it is essential that competent clerical help be pro- 
vided. It is the kind of work which is not easily done by those un- 
familiar with it, and is likely to be tedious as well as voluminous. 
To be done effectively it should be done by those who understand it 
and who have more than ordinary interest in it. Such persons can 
be found in every school system and when found, should be assigned 
to render this kind of service. No city should undertake educa- 
tional measurement without understanding that it involves some 
expenditure : a portion of this should be devoted to the securing of 
competent clerical assistance for statistical tabulations and another 
portion to the general direction and supervision of the work (by 
the superintendent of schools in the smaller cities and by a special 
school official in the larger cities). 

After the various tabulations and summaries have been made 
by grades, by schools, and for the city as a whole, the results must 
be interpreted. This is one of the most important phases of educa- 
tional measurement. On the interpretation of the results really 
depends the usefulness of the tests. One of the greatest dangers 
involved in educational measurement throughout the country today 
is the fact that many persons are giving tests who are not com- 
petent to tabulate and interpret the results. In general, interpreta- 
tion should be made by the superintendent, or by some other school 
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official thoroughly conversant with all the local conditions. The 
best results wiU be obtained when some competent person, who is 
especially interested in and trained for such work, is engaged by the 
board of education to supervise the giving of tests and to interpret 
the results to the teaching staff. 

E. Making Use op the Results 

Successful educational measurement work in a school system 
involves not only selecting the best available tests, giving them ac- 
cording to directions, scoring the papers, tabulating the results, 
(all at a minimum of cost in time, energy, and money) and inter- 
preting the results, but it also involves getting the information de- 
rived from the tests to the persons concerned, getting those persons 
to consider it, and getting them to do something about the condi- 
tions revealed, if those conditions demand it. 

Obviously, some sort of report must be prepared by the official 
who interprets the results of the tests. Typewritten or mimeo- 
graphed copies of the report may be made for a small school system, 
but if a large number of copies is needed, the report should be 
printed. If printed, it is more likely to be given consideration by 
those concerned and to be kept for future reference and guidance. 

Much of the success in getting the desired information to 
those concerned will depend on the character of the report. With 
conditions as they are at present, the facts must as far as possible 
be stated in simple, non-technical language. Few teachers now in 
the service have any knowledge of the technical terms employed 
in educational statistics. If such terms are used, they must be care- 
fully defined. Our courses for the training of teachers are now be- 
ginning to include some study of educational measurement. It will 
be a generation, however, before the teaching profession as a whole 
can be presumed to have had training in educational statistics. Until 
that time it will be necessary to prepare non-technical reports for 
the teaching profession. 

Finally, it should be emphasized that standard tests are for di- 
agnostic purposes: they show the abilities and needs of children. 
Recently one teacher said : ‘‘My class has taken the Courtis tests in 
arithmetic twice each year for three years, and I do not see that it 
has done them any good ! ' ’ On investigation, it was found that this 
teacher had made no effort whatsoever to use the information fum- 
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islied by the tests as a basis for correcting the many weaknesses 
shown among her pupils. There is no special magic in standard 
tests which will work any educational miracles on pupils who take 
them. The teacher must be made to realize that the results from 
the tests are for her information. Unless she is shown how to make 
use of them, and is willing to do so, standard testing is not worth 
while. On the other hand, if the teacher is willing and intelligent, 
the information derived from standard tests is of the greatest ser- 
vice. For it enables her to reach desired goals by the most direct 
path, with the least expenditure of energy and labor on her part 
and with the greatest benefit to the children. 



CHAPTER V 

BUREAUS OF RESEARCH IN CITY SCHOOL SYSTEMS 


Eugene A. Nhteneckee 

Assistant Director, Bureau of Eesearch and Beference, New York City 


The city bureau of educational research is the direct and logi- 
cal outcome of the combination of the survey movement and the 
movement for the use of measurement. A careful appraisal by a 
group of experts from without has been demanded and paid for by 
community after community. Very evidently such stock-taking is 
regarded by the public as beneficial in its effects. Too often, how- 
ever, the result, as far as the local schoolmen are concerned, has been 
disastrous. The time spent by the experts in the local field is nec- 
essarily short, their knowledge of local conditions necessarily lim- 
ited and their interpretation of results has sometimes seemed to lo- 
cal school officials unwarranted and unjust. More often than not, 
superintendents have attempted to explain away the defects re- 
vealed and to make light both of the facts and of the recommenda- 
tions of the survey experts. But facts are stubborn things, and 
changes of superintendents and upheavals and reorganizations of 
various sorts have inevitably followed the adoption of such a policy. 
So it has come about that many times a survey report has been of 
more direct benefit to schoolmen in other cities than to those in the 
city for which it was made. 

Thoughtful superintendents have come to see that, above every- 
thing else, the best insurance against the survey lightning is a sur- 
vey conducted from within. Many a schoolman has asked himself 
the searching questions: *‘What would a survey reveal about my 
school system ? On a factual basis, what do I really know about my 
own work?’' Straightway he has begun himself to investigate. The 
various types of studies made in a modern survey are repeated on a 
smaller scale. Measurements are made and constructive attempts at 
remedial adjustments are begun. 
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The average administrator, however, is a busy man. He has 
little time, and by virtue of his training and position, little aptitude 
for close analytical studies or statistical investigation. Ordinarily 
he himself does just enough to realize the importance and value 
of the work, then delegates it to a specially trained person who 
works under his direction. In the smaller towns and cities this per- 
son may be only a clerk or bookkeeper who performs the mathemat- 
ical and statistical labors involved. In the medium-sized cities a 
supervisor or assistant superintendent is often detailed to give part 
or all of his time to the work, and in the larger cities there are for- 
mal organizations of special research departments. 

One of the best formulations of the functions of such research 
work is to be found in the report of the Committee on School In- 
quiry, New York City. Professor E. C. Elliott in his study of the 
administration of the New York Schools^ made among others the 
following recommendation : 

^^Recommendation III 

‘‘That there be established as an integral part of the system of 
school control, a Bureau or Division of Investigation and Appraisal. 

“This bureau or division should be in charge of a chief or 
superintendent, who is directly responsible to the Board of Edu- 
cation, and should be organized in such a manner as to enable it 
to serve as the central agency for the gathering and interpretation 
of statistical and other data with reference to the schools ; and also 
for the carrying on of such investigations as are necessary for the 
rational development and expansion of the school system. It should 
bear the same general relation to the Department of Education as 
the existing Bureau of Municipal Investigation and Statistics bears 
to the Department of Finance. 

“The following arguments may be indicated: 

“ (1) The school system of the city suffers from a lack of defi- 
nite, detailed knowledge of its own working and its own cost. As 
has already been pointed out, the fundamental importance of the 
inspectorial form of control has been recognized only to a very lim- 
ited extent. And even where its importance is recognized, officials 
charged with the responsibility for administrative or supervisory 
duty appraise their own performances. Investigation that is needed 
is not carried on at all. 


^Report on Educational Aspects of Public School System of the City of 
New York to the Committee on School Inquiry of the Board of Estimate and 
Apportionment, Volume II, page 401. 
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‘‘ (2) It is evident that one of two things will result in the 
immediate future. Either the work indicated for this proposed 
bureau will be attempted by agencies outside of the school system 
or else there must be established, within the school system, as an 
integral part of its organized control, an agency properly equipped 
with trained investigators to set forth to the supervisory and ad- 
ministrative officials of the school system, and the people of the 
city, those essential facts absolutely necessary for the intelligent 
development of schools and of public sentiment. Of these alterna- 
tives, it would seem that the latter is to be greatly preferred. No 
outside agency could carry forward the work of inspection and of 
formulating impartial judgments of results, and of proposing new 
procedures without much friction and loss of energy. 

“ (3) The problems of public education in New York City are 
not conventional problems. Many of the more pressing ones are 
new in the social and educational world. They cannot be solved by 
preconceptions, or the showing of hands. In so far as possible, 
the situation and causes that have generated these problems must 
be weighed and analyzed before rational and permanent solutions 
can be found.’’ 

This recommendation was immediately acted upon. A Division 
of Reference and Research was formally organized and began its 
work at the opening of schools, September, 1913. The quotation 
above, therefore, is of historical interest. For while the New York 
Bureau by no means represents the beginning of systematic, scien- 
tific study of school problems by school authorities, it was undoubt- 
edly the first to be definitely organized for that sole purpose. 

The real beginning of the movement that has led to the es- 
tablishment of bureaus of research cannot be clearly traced. In 
some cities there have been for many years committees, bureaus and 
special commissions for the more or less systematic study of build- 
ings, of children, of teachers, of instruction, and of many other 
forms of school activities. As rapidly as the movement for meas- 
urement has developed, the investigations of these agencies have be- 
come more and more truly scientific. Even today the organization 
and function of a bureau of research are not clearly defined, and all 
sorts of studies are being carried on by all types of workers. In some 
cities, the department consists of little more than a high-sounding 
title bestowed as a compliment upon some existing school officer. In 
others, the director of the bureau is given the rank and salary of an 
assistant superintoident and is a real director of an extensive de- 
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partment. Between these two extremes every type of variation may 
be found. 

It is not surprising, therefore, that the actual work of a bureau 
of research varies from city to city, being determined in the main by 
the tastes and training of the director, and somewhat by local needs. 
In some cities attention has been given almost wholly to a study of 
costs and to purely administrative problems ; in other cities the en- 
ergy of the department has been expended only on the measurement 
of educational products; in stiU others the deliberate attempt has 
been made to do some work in every field. In general, the tendency 
seems to be to regard the special function of the department as the 
devising, giving, scoring, tabulating and interpreting of standard 
tests and the prosecution of such other investigations only as may 
aid in the interpretation of the results secured. 

A city department of research ordinarily consists of a director 
and clerical or stenographic assistants. At first, the tendency was 
to select any capable schoolman available, with little regard for his 
qualifications for the special work, but later appointees, either as 
directors or assistants, have been young men or women specially 
trained in statistical methods and in educational measurements. The 
director, almost without exception, is responsible directly to the 
superintendent and under his immediate control. The salaries paid 
range from $1,100 to $6,000, (median of 13 cases $2,700). 

It has proved very difficult to obtain complete information with 
regard to the number of cities which have organized bureaus or 
which are carrying on organized work in measurement.* In spite 
of repeated questionnaires and persistent efforts, only a most tenta- 
tive list can be given. However, formal organizations are found in 
the cities listed on the following page. 

No attempt has been made to list cities in which research work 
is being carried on without formal organization, because of the im- 
possibility, both of making the list at all complete and of distinguish- 
ing between a mere incidental use of tests once from idle curiosity 
and their persistent, intelligent use, year after year, for worthy 
ends. Seattle, for instance, does all the work and has all the benefit of 
a bureau of research, but has no formal organization. The assistant 

‘The various functions of a bureau of research are illustrated in other 
chapters, and the discussions will not be repeated here. 
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PAETIAL LIST OF CITY BUREAUS OF RESEARCH 


City 

1. Baltimore, Md. 

2. Boston, Mass. 

3. Buffalo, N. Y. 

4. Chicago, 111. 

5. Cleveland, Ohio 

6. Detroit, Michigan 

7. nibbing, Minn. 

8. Kansas City, Mo. 

9. Louisville, Ky. 

10. Los Angeles, Cal. 

11. New York City 

12. New Orleans, La. 

13. Oakland, CaL 

14. Omaha, Neb. 

15. Rochester, N. Y, 

16. St. Paul, Minn. 

17. Schnectady, N. Y. 

18. Topeka, Ks. 


Title of Bureau 

Bureau of Statistics and 
Research 

Department of Education- 
al Investigation and 
Measurement 
Bureau of Research 
Department of Standards 
and Statistics 
Department of Reference 
and Research 
Department of Educa- 
tional Research 
Department of Educa- 
tional Research 
Bureau of Research and 
Efficiency 

Psychological Laboratory 
Division of Research 
Bureau of Research and 
Reference 

Bureau of Educational 
Research 

Bureau of Reference and 
Research 

Bureau Educational 
Research 
Efficiency Bureau 
Bureau Research and 
Efficiency 

Bureau Research and 
Efficiency 


When Name of 

Organ- Director 

ized 

Edwin Hebden 

1914 Prank W Ballou 

1916 Wm. A. Mackey 

1917 S. B. Allison 

1916 C. W. Sutton 

1914 S. A. Courtis 

1915 J. W. Richardson 

1914 Geo. Melcher 
1914 Henrietta V. Race 

1917 Robert Lane 

1913 E. A. Nifenecker 
1912 (Discontinued) 

1914 Virgil E. Dickson 

1917 H. W. Anderson 

1912 J. P. O'Hem 

1917 L. L. Everly 

1913 H. L. Davenport 

1916 Ira J. Bright 


superintendent in charge is a director of educational research in all 
but name. There are a large number of such cities throughout the 
United States. At the other extreme is the superintendent or 
teacher whose curiosity is stimulated by some talk or article, and 
who gives a test once to a single class. The sales of standard tests 
have grown to very great proportions. Last year, of a single popu- 
lar test, nearly 900,000 were used, and the annual sale of a few other 
tests run well over 100,000 copies each. Nor is the use of tests con- 
fined to this country. Shipments are made to all quarters of the 
world. It seems quite probable, therefore, that the number of bu- 
reaus of research are destined to be greatly increased in the im- 
mediate future. 



CHAPTER VI 

COOPERATIVE WORK FROM A UNIVERSITY CENTER 


Ernest J. Ashbaugh 

Director Educational Service, Extension Division, State University of Iowa 


One of the most significant developments in the university in 
the last twenty years has been the growth of the idea of service — 
service not only to the comparatively few who were able 
to come to the campus, but service to the many who were not able 
to come. The idea was developed largely in state universities, where 
the funds for support came from direct taxation of all the people of 
the state and where the leaders came to realize that the service of 
the university should be extended to all who were taxed to support 
it. Only recently, however, has the university as a whole made an 
organized effort to educate all the people in its territory and to ren- 
der service in the solution of the problems of its supporters off the 
campus. In fact, some of our state universities are doing almost 
nothing in these lines at the present time. 

One of the very latest lines of service to be developed is that 
of cooperative educational research. The movement is so com- 
pletely in its infancy that the method and the organization of the 
work have been but imperfectly worked out. However, two aims 
are rather definitely agreed upon by those engaged in this work, 
namely : (1) to make the university bureau a center for the direc- 
tion of cooperative work with the school people of the state in the 
solution of the problems in which the latter are most interested; 
(2) to make the bureau an agency for the collection of rough data; 
for the tabulation, organization, and interpretation of these data ; 
and for the distribution of the results of the study to the people 
contributing and to others, in order that significant facts may be 
known by the workers in the field. 

A third aim might be added (though it must necessarily be sub- 
servient to the other two) — ^that of gathering masses of scientific 
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data which may be used in the further study of educational prob- 
lems. 

Nearly all problems connected with public-school work will lend 
themselyes to cooperative research when once the university and the 
school people in the state come to a close understanding. The early 
efforts in the collection of data on school attendance, age-grade and 
age-progress studies, school health and causes of absence, the use of 
standard tests in arithmetic, reading, spelling, etc. ; the figuring of 
unit costs of instruction and scoring of buildings and physical 
equipment — ^together represent but a small portion of the field that 
may be entered with profit. 

The length of the period of compulsory schooling is increasing, 
but it does not suffice merely to attend longer. Much needs to be 
known concerning the extent to which communities and states are 
fulfilling their obligation to all their children — ^the obligation of giv- 
ing them seven to twelve years of educational opportunity of a kind 
which will function in self-support and a wholesome social attitude. 
We need to know not only the extent to which this opportunity is of- 
fered, but also the extent to which this opportunity is being utilized. 
Cooperative research from a university center could obtain this 
knowledge. 

We are in a time of critical questioning concerning the mater- 
ials of education and the quality of the product turned out by our 
schools. The whole field of the school curriculum is open for re- 
search. Experimental methods certainly ought to be used in de- 
termining such things as time allotment, material to be used in each 
of the various grades and the arrangement of this material through- 
out the course. It is possible that method of presentation might also 
be included. This experimentation could be advantageously di- 
rected in cooperating schools by the university research bureau. 
Each problem would be specifically stated and the technique of 
procedure worked out at the university. The bureau would have 
the constructive criticism of the faculty in education and the pre- 
liminary experimentation would be carried on in the university 
experimental school. This would prepare the way for most effect- 
ive work in the public schools which were cooperating with the bu- 
reau. As results were secured, each advanced step would thus be- 
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come immediately available to all the schools of the state. Many 
city superintendents are experimenting along some of these lines, 
but their experiments are not coordinated. Cities having bureaus 
of research are doing more extensive and critical work. But under 
the leadership of the university it would be possible to conduct 
such experiments xuider the varied conditions of small and large 
city, with the typical schoolroom conditions of each, and thus render 
a service to all communities in the state. 

It is not the purpose of this chapter to list all the problems 
of cooperative research nor to give in detail the manner of solution. 
The purpose is rather to state the aims of a research bureau in a 
university center and to note very briefly some of the larger flelds 
that await attack. 

Development of the Iowa Bureau of Educational Service 

MetTiod — The fundamental aims noted above have been kept 
constantly in mind in the development of the Bureau of Educa- 
tional Service in the Extension Division of the University of Iowa. 
The work was started in the fall of 1914, and the writer has been in 
charge of the work since its beginning. Some time was spent in 
traveling over the state and talking with the superintendents and 
thus learning at first-hand the most promising lines for initial en- 
deavor. 

The field had been partially prepared by courses in tests and 
measurement given in the College of Education during the preced- 
ing two years. Schoolmen attending the University had gained 
some knowledge of school surveys and some interest in cooperative 
work. The hearty cooperation of the faculty in education has been 
one of the greatest assets of the Bureau. In classroom, in institutes, 
and at teachers’ associations they have furthered the work through 
frequent reference to the Bureau and its activities. The director of 
the Bureau holds rank in the College of Education and is a member 
of the instructional staff during the summer session. He thus 
comes in contact with the superintendents and principals who take 
summer work and assists in training them for further work in the 
field. 
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The annual conference on supervision, held at the University 
under the joint auspices of the Extension Division and the College 
of Education, has contributed much to the development of the work. 
It has brought together a group of school people (increasing in 
number from year to year) who are primarily interested in the 
problems of supervision. Such men as Judd, Strayer, Coffman, 
Ayres, Bagley and others have appeared upon the programs and 
promoted the interest in scientific education. The director of the 
Bureau has had a place at one general session each year and pre- 
sented the results of research studies. At this conference, at the 
State Teachers' Association and at the sectional educational meet- 
ings each year, the director has met superintendents, principals and 
teachers in general sessions and round-table discussions. He has 
also met another group of men and women, the county superin- 
tendents. These persons, almost the only supervisors of the rural 
teachers, are becoming more and more interested in measurement 
of results, and their cooperation is being secured. 

In a word, the secret of the development of the Iowa Bureau 
has been in the establishment of cordial cooperative relations with 
the various educational agencies of the state. 

Fields — The first research problem attacked was that of at- 
tendance and the second, the measurement of school progress through 
the use of standard tests. In February, 1915, an arrangement was 
made with Mr. S. A. Courtis whereby the Bureau has exclusive con- 
trol of his Series B tests in the state. In November of the same year a 
similar arrangement was made for the handling of the Kansas Silent 
Reading tests. These arrangements are still in force. Meanwhile, 
the Bureau has kept a small quantity of other tests on hand and 
stood ready to secure any others upon request. In addition to fur- 
nishing the superintendents of the state with the test material at 
cost, the Bureau has stood ready to give personal service in the giv- 
ing of these tests. The writer has gone into a number of schools and 
demonstrated to the superintendents and principals the method of 
giving, scoring, and interpreting the results of these tests. This has 
resulted in the schoolmen becoming very much interested in the new 
field of tests and measurement. Many of these men and women have 
since studied at the University to increase their knowledge along 
these lines. 



COOPEBATIVE WOBK FBOM A UNIVEBSITY CENTEB 61 


A third activity developed has been an information service 
by which the Bureau attempts to give, through its contact with the 
various departments of the University, technical information on 
any problem connected with the schools of the state. Questions on 
buildings, heating, lighting, ventilation, playgrounds, health ser- 
vice, course of study, census, finance, etc., have come from school 
boards, superintendents, teachers and patrons, and the Bureau has 
transmitted to the inquirers the best information available. 

Another feature of the work of the Bureau has been local 
school surveys. On joint invitation of the school board and super- 
intendents, a survey of any phase of a local school problem will be 
carefully made, the results analyzed and recommendations rendered. 
Care is taken here not to encroach upon the field of legitimate pri- 
vate enterprise. For example, service that belongs definitely to the 
field of an engineer, architect, or public accountant is not given. 

Present Status of Work 

As a result of the activity of the Bureau during the three and a 
half years of its existence, its status has been quite firmly estab- 
lished. At first there was a question in the minds of many, both in 
and out of the university, whether such a bureau was a legitimate 
part of the activity of a university and perhaps even more of an 
Extension Division. At present the doubt no longer exists in Iowa. 
The Bureau, through its service to the superintendents and school 
boards, has settled this question afiSrmatively. 

It is understood at the University that it is the function of the 
Bureau of Educational Service to conduct researches in the field 
of education looking toward the promotion of efficiency in school 
work. All reasonable assistance will be given by various departments 
and colleges of the University to the furtherance of this work. This 
cooperation has been secured through a sincere effort to ask only rea- 
sonable assistance and to give full credit to the college, department 
or individual which rendered assistance to the Bureau. 

The relation of the school people of the state to the Bureau has 
always been that of voluntary cooperation. No system of university 
credit for work done or other means of stimulating their cooper- 
ation has been offered. At all times there has been a definite under- 
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standing between the director of the Bnrean and the superinten- 
dents that each superintendent should decide for himself whether he 
ought to cooperate in any proposed research. If he believed that 
the result of the research would be sufficiently valuable to his own 
school to justify the expenditure of the time and effort required, 
the Bureau would be very glad to receive his contribution. If he 
did not believe this would be the case, the most friendly relationship 
was maintained, and the invitation was repeated when another line 
of research work was undertaken. Thus, each superintendent in 
the state has come to look upon the Bureau as asking for only such 
assistance as shall contribute directly to the solution of his own 
problems. 

Results Accomplished 

The results accomplished fall rather definitely into three 
groups, according as they pertain to (1) state-wide surveys, (2) 
local surveys, and (3) general service. 

State-wide Surveys — 

(1) A state- wide survey of handwriting involving 110 cities 
and towns and rural pupils from fourteen counties was made in 
1915 and the results issued in bulletin form.^ The following six 
questions were asked and answered on the basis of the information 
secured by the survey : 

1. How well do Iowa school children write? 

2. Do children improve their quality of writing regularly 
as they progress through the grades? 

3. Do children attending school in towns and cities write bet- 
ter than those attending the rural schools ? 

4. Do the children in the larger cities write better than those 
in towns or smaller cities? 

5. How do children in this state compare with children in 
other states? 

6. Is the quality of writing of the average eighth-grade child 
sufficient to satisfy the ordinary demands of every-day life outside 
of school ? 

The samples of writing were scored by the Ayres Handwriting 
Scale* and the conclusions were as follows : 

^E. J. Ashbaugli, Handwriting of Iowa School Children, Extensim BvX- 
letin. No. 15, State University of lotoa. 

^Leonard P. Ayres, A Measuring Scale for Handwriting, Bussell Sage 
Foundation. 
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1. Iowa school children in the eight grades write approxi- 
mately at Qualities 30, 35, 40, 45, 50, 52, 57, and 60, respectively, 
on this scale. Reference to the scale is necessary to understand the 
means of these values. 

2. Yes. The improvement is quite uniform through the lower 
grades, but less rapid in the upper. 

3 and 4. No. The differences between the quality of writing 
of children attending the rural school, small town and cities are 
negligible. 

5. On the average, Iowa children are writing as well as chil- 
dren of like grade elsewhere in the United States. 

6. Apparently the quality of writing of the majority of eighth- 
grade children will satisfy the ordinary demands of daily life, since 
75 percent of these children write a better quality than is required 
by the New York Municipal Civil Service Commission. 

(2) A similar survey of achievement in the fundamentals of 
arithmetic as measured by the Courtis Series B tests was made in 
1916 and the results distributed in a bulletin.® The following four 
questions were proposed and answered in this bulletin : 

1. How skillful are Iowa children in performing the four fun- 
damental operations in arithmetic? 

2. How does the skill of Iowa school children compare with 
that of children of like grades in other states? 

3. How does the skill of children in small towns compare with 
that of children in larger towns and cities? 

4. What use can be made of Standard Tests ? 

1. The median speed and accuracy of children were ascer- 
tained in each of the grades. Mr. Courtis’ standard as well as the 
scores of each of the cities contributing data was furnished for 
purposes of comparison. The evidence indicated that more speed 
was needed in the upper grades and greater accuracy in all. 

2. Iowa school children were shown to excel in most grades and 
operations when their scores were compared with the available 
records of sister states. 

3. While, in general, the scores of pupils in smaller towns are 
lower than those of pupils in larger places, the records of some small 
towns showed clearly that size of place is not a determining factor. 

On the basis of these showings many schools in the state have 
modified their courses in order to give greater attention to drill 
work in fundamentals. 

’E. J. Ashbaugh, Arithmetical Skill of Iowa School Cbiidxen,Extension 
Bulletin, No £4, State University of Iowa, 
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(3) Less elaborate studies have been made of achievement in 
reading and spelling, attendance, causes of absence and teachers’ 
marks. In reading and spelling, the general state situation seems to 
be average or slightly above when compared with other states. 
Individual cities have discovered weaknesses through these surveys 
and modified their practice accordingly. The attendance survey 
showed an amount of absence almost incredible to superintendents 
and teachers — 20 percent of the children out more than 10 percent 
of the time; 10 percent of the children out more than 20 percent 
of the time ; 6.6 percent of the children out more than 30 percent 
of the time — ^not a satisfactory situation. High ‘percentages of 
attendance’ have been secured through the method of dropping a 
child’s name from the roU after a period of a day and a half to 
three days’ absence. This, however, does not make for increased 
school attendance by the child. 

In all these studies the random selection of cities and the large 
number of pupils involved makes it probable that the results are 
typical of the state. Definite readjustments are known to have 
been made in a number of school systems on the basis of the facts 
revealed. 

Local Surveys — 

(1) In response to direct invitations by superintendents and 
school boards, more or less complete surveys of four Iowa school 
systems have been made. Written reports have been rendered to 
the school boards in each case. 

(2) Building surveys involving the question of utilization of 
present buildings and the best solution for caring for the increased 
school population have been conducted in three cities. 

Local surveys are possible only by the closest cooperation with 
teachers, superintendents and school boards. Meetings with teach- 
ers are arranged where the results of the surveys of instruction are 
given and the interpretation is carefully explained. These meetings 
enable the teachers to apply the survey results so as to increase class- 
room efficiency. Meetings with the school board are also arranged 
so that information may be given them that will help them to ful- 
fill their functions to the greater good of the schools. 
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General Service — 

(1) The various standard tests have been furnished at cost to 
the schools of the state. This has kept the Bureau in touch with all 
schools doing measuring work and fostered the cooperative rela- 
tionship, 

(2) An informational service has been established which en- 
deavors to be a source of help in the solution of school problems. 
This service is free and open to anyone within the state. 

The response of the school superintendents of the state to 
any projected cooperative work, the large number of calls for sur- 
veys of various kinds, and the utilization of the obtained results, 
furnish the best evidence of the value of the work that has been ac- 
complished. The fact that calls are still coming in for studies issued 
more than two years ago indicates that the work of the Bureau is 
considered as having more than temporary value. During the three 
years and a half cooperative relationship has been established with 
more than 100 different cities and towns of the state. At the pres- 
ent time I know that when an invitation is extended for cooperative 
work, a truly representative number of cities and towns can be 
relied upon to furnish the desired information. 

Difficulties 

A fundamental difficulty, and one that often tends to invali- 
date results, is that a large number of teachers, principals and su- 
perintendents have not had an opportunity for training in scientific 
research work. With the very best intention to cooperate, direc- 
tions are frequently misunderstood or the importance of their being 
followed in an absolute manner is not appreciated and thus a vari- 
able factor enters into the work. This difficulty can be overcome 
only by extreme care and preliminary experimentation in the for- 
mulation of directions. 

Another problem, perhaps even greater than the securing of 
reliable data, is presented by the question : How shall the results 
of researches be reported in such a way that the greatest possible 
good may come to schools ? The purpose of the research is the modi- 
fication of schoolroom practice. In most cases this will be accom- 
plished only when the results reach the teacher in a form which 
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is dear, definite and usable. It must be remembered that the move- 
ment in scientific education in recent years is demanding a new 
type of supervisor — one trained in the handling and interpretation 
of scientific data. But until the universities, colleges and normal 
schools are able to give us this trained corps, and a representative 
is found in every school, it will be necessary to depend largely upon 
the classroom teacher for the application of the report to practice. 
Hence, special effort should be made, first, to present the report 
in such form that the ordinary teacher may dearly see its meaning 
and be inspired to utilize the results in her own classroom, and 
second, to distribute the report in such a manner that it shall come 
directly to the attention of the teacher herself. 

The difficulty of securing adequate assistance in caring for the 
work presents itself to nearly every bureau. To get reliable data 
in many studies it is often more necessary to score the papers than 
to give the tests. This requires much time and clerical assistants 
with a specific kind of training. Where the director of the bureau 
is on the regular teaching staff of the institution, he may personally 
solve this difficulty by the use of research data as laboratory mater- 
ial with his classes. This enables him to train his help and give his 
students very valuable experience with this kind of work at the 
same time. A counter-difficulty arises with this plan in that the 
director is not free for extended absences over the state as calls may 
require. 

State surveys that involve the collection of material in many 
schools and the compilation and interpretation of results at the cen- 
tral bureau present only the difficulties of securing the needed co- 
operation in the field and the necessary assistance at the bureau. 
These can be met. But local surveys present a well-nigh insupera- 
ble difficulty in the element of time. Even with a large staff, exten- 
sive local surveys could be made in only a few cities in a year. For- 
tunately, many features of a local survey can be done as well by 
the local superintendent, if he be a trained man, as by a represen- 
tative of the bureau. Hence, a partial solution to this difficulty lies 
in the training of superintendents. 
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Suggestions 

On the basis of three years and a half experience with the Bu- 
reau at the University of Iowa, of frequent conversations with those 
in charge of similar bureaus elsewhere, I offer the following sugges- 
tions to those who contemplate the creation of a similar bureau ; 

1. The bureau should be so financed as to make available 
sufficient funds to care for the immediate tabulation of the results 
of any study conducted by it. 

2. A plan should be devised by which the director of the bu- 
reau and persons interested in its cooperative activity may meet 
for consultation one or more times during the year. These meetings 
might be arranged in connection with the State Teachers’ Associ- 
ation or other educational gathering at which a large group of 
school superintendents would be present. 

3. The director should be a member of the teaching staff of 
the university during its summer session so that through the class- 
room he may increase the interest in the work of the bureau and 
assist in the training of teachers, principals and superintendents 
for educational investigation. 

4. Results of researches should be presented to the public, in 
a clear and forcible manner, attractive to the lay reader and in a 
form usable by the average teacher. Specific problems should be set 
up to which specific answers are given. These problems and their 
answers at least should be intelligible to all readers, even though 
some readers may not be able to follow clearly all the processes by 
which the results are obtained. 

5. With the issuance of the studies of the bureau, arrange- 
ments should be made for a publicity campaign that will focus the 
attention of the school people upon these results. 

Historical Note 

Pioneer efforts along the line of cooperative investigation in 
educational measurements were made in 1910 by S. A. Courtis in 
arithmetic. One of the outgrowths of his activities was the taking 
over of the research work within states by state universities. The 
first university to recognize the possibilities of service through the 
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formation of a bureau of cooperative research was the University 
of Oklahoma, in 1913. At this time, however, cooperative work was 
also under way in the University of Indiana, and the formal organi- 
zation of the Bureau of Cooperative Research took place the follow- 
ing year. Similar bureaus at the University of Iowa and at the 
State Normal School, Emporia, Kansas, were organized a little later. 
A partial list of university bureaus of research (with sufficiently 
formal organizations to have stationery of their own !) is as follows : 


University 

Univ. of Arkansas, 
Fayette, Ark. 

Univ. of Indiana, 
Bloomington, Ind. 

Univ. of Iowa, Iowa Ci 
Iowa 

Univ. of Kansas, 
Lawrence, Kan. 

Univ. of Minnesota, 
Minneapolis, Minn. 

Univ. of Nebraska, 
Lincoln, Neb. 

Univ. of Oklahoma, 
Norman, Okla. 

Univ. of South Dako 
Vermillion, S. Dak. 


Name of Bureau 

Bureau of Educational 
Tests and Measure- 
ments 

Bureau of Cooperative 
Research 

Educational Service, Ex- 
tension Division 

Bureau of Educational 
Measurements 

Bureau of Cooperative 
Research 

Bureau of Educational 
Measurements 

Bureau of Measurements 
and Efficiency 
L, Bureau of Educational 
Research 


When Name of 

Organ- Director 

ized 


1917 J. R. Jewell 
1914 (Position unfilled 
at present) 

1914 E. J. Ashbaugh 
1916 F. J. Kelly 

1915 M. E. Haggerty 

1914 Charles Fordyce 
1913 W. W. Phelan 

1915 W. Franklin Jones 


It must not be supposed, however, that the universities that 
appear in the list above are the only ones actively supporting the 
movement. From Harvard and Columbia Universities in the east, 
to Leland Stanford Junior University in the west, from the Uni- 
versity of Wisconsin in the north to the University of Texas in the 
south, similar work is being done by schools and departments of 
education. If a distinction is to be made at all, it is that the univer- 
sities that have formal bureaus usually act as distributing centers 
for testing material, but even this distinction does not always hold. 
The influence of university men in education has naturally been 
one of the main factors that have led to the growth of the movement. 

The university, moreover, has by no means been the only factor. 
Normal schools and teachers^ institutes have done their share. At 
present bureaus of research are to be found in at least two normal 
schools : 






COOPERATIVE WORK FROM A UNIVERSITY CENTER 69 


When Name of 

Normal School Name of Bureau Organ- Director 

ized 

State Normal School, Bureau of Educational 
Emporia, Ks. Standards and Meas- 
urements 1914 Walter S. Monroe 

Northern Normal and Bureau of Educational 

Industrial School, Ab- Besearch 1917 Willis E. Johnson 

erdeen, So. Dak. 


Still another agency for the development of educational re- 
search throughout a state has been the state department of educa- 
tion. In at least three of these attention to research work is the 
special duty of a particular member of the staff. 


state 

Name of Bureau 

When 

Organ- 

ized 

Name of 
Director 

New York 

State Dept, of Education 


Wm. A. Avenll 

Wisconsin 

State Dept, of Education 

1915 

W. W. Theisen 

Georgia 

State Dept, of Georgia 


M. L. Duggan 


Probably this is the proper place, also, to comment on the aid 
rendered by the U. S. Bureau of Education. Bulletins and reports 
of committees on standard tests and scales have been printed ; sur- 
veys have been conducted, and in many ways the Bureau at Wash- 
ington has done what it could to further the cause of measurement. 

Finally, tribute must be paid to the influence of the great foun- 
dations. The Russell Sage Foundation, through its Division of Edu- 
cation, under the direction of Leonard P. Ayres, was the pioneer in 
this field and has been one of the major influences responsible for 
the rapid development of the movement for measurement. The 
Ayres Scales in writing and spelling are widely used, while the in- 
vestigations of the Division in the field of cost-accounting and child- 
accounting have had even greater influence. The survey work car- 
ried on by the Division has been a third type of activity that has 
had a great effect in directing the minds of school men to the possi- 
bilities of measurement. The contributions of the foundation to 
this field are very great. 

Of recent years, more and more attention has been given by 
other foundations to the field of educational research. The Cleve- 
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land Foundation, although of local origin and interest, has, through 
the Cleveland Survey, made its contribution to the educational pro- 
gress of the country. The Carnegie Foundation has recently en- 
tered the field. 

.The General Education Board, also, has given many evidences 
of its interest in the scientific study of educational problems. It 
has supported the New Hampshire Bureau of Educational Be- 
search, subsidized research work at the University of Chicago and in 
other places, conducted surveys in the Maryland and Gary schools, 
and IS now carrying on an experimental school. 

It seems clear, therefore, that in the future the development 
of educational measurement should be even more rapid than in the 
past. At one extreme are great foundations willing to expend large 
sums of money for any educational investigation that promises to 
yield results of permanent value. At the other extreme is a vast 
army of educational workers actively engaged in teaching children 
and anxious to make use of every tool, device, or method, that will 
help them to do better work. Between the two are the universities, 
colleges and normal schools, training men both to carry on research 
work successfully and to apply the results of experimental studies 
to the practical problems of the schoolroom. 



CHAPTER Vn 

EXISTING TESTS AND STANDARDS 


Walter S. Monroe 

Director, Bureau of Educational Measurements and Standards, Kansas State 
Normal School, Emporia, Kansas 


During the past decade and especially during the past five 
years, the number of tests available for measuring the abilities of 
children in school subjects has grown very rapidly. Those de- 
scribed in this chapter are as follows: 


STANDAEDIZED TESTS FOB USE IN THE ELEMENTABY SCHOOL 
Arithmetic 17 


(Fundamental Operations 11) 


Language 


(Arithmetic Beasoning 6) 


Music 


Drawing 

.. 1 

Silent Beading 

13 

Geography 

.. 6 

Oral Beading 

4 

Handwriting 

..10 

Spelling 


History 

.. 4 

Total 



STANDARDIZED TESTS FOB USE IN THE HIGH SCHOOL 

Algebra 7 History 1 

Drawing 1 Physical Training 1 

Foreign Languages 11 Physics 1 

Geometry 3 Total 25 

The superintendent or teacher who wishes to measure the re- 
sults of instruction, faces the problem of making a wise choice from 
this material. This account of existing tests and standards has been 
prepared to assist in making this choice. Because of the limitations 
of space, only brief descriptions are possible. In the chapter pre- 
pared by Miss Bryner the reader will find references to the accounts 
of the derivation of all the tests and of their use, whenever they arc 
available. Most of the tests are also described more completely in 
the writer’s book on Educational Tests and Measurements^ Hough- 
ton Mifi3in & Company, 1917. 
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The standards given in several cases are tentative only and 
all are, of course, subject to revision. Unless standard scores are 
based upon a large number of cases, they should be used with cau- 
tion in making comparisons. In all cases it should be remembered 
that any standard which is simply a statement of the concensus of 
present practice is open to the criticism that we must not assume 
that what is, is what should be. For example, it may be that a high 
degree of efficiency requires a much higher standard in the rate of 
silent reading than the average rate at which children now read in 
the several grades. 

It should also be remembered that the scores upon which these 
standards are based were obtained by following certain definite 
directions in giving the several tests. Even slight variations in pro- 
cedure often materially affect pupils^ scores. Hence, when compar- 
ison with standard scores is the object of the testing work, one 
should follow the standard, or specified, directions which in most 
cases accompany the tests. In any case, in making comparisons it 
is unwise to attach great importance to small differences in scores.^ 

An effort has been made to make this list as complete as possi- 
ble, but doubtless some tests have been overlooked, and it is certain 
that within a short time new tests will be announced. In fact, sev- 
eral were found that were in the process of derivation. Certain 
special tests which have been devised for laboratory research are 
intentionally omitted because they are not available for distribution 
and in general are not suitable for the type of testing the superin- 
tendent or teacher will do. Practice tests or other exercises which 
are primarily teaching devices have also been omitted. 

The tests may be obtained from the addresses given When no 
address is given, the tests are not available for general use. Even in 
a few cases where the address is given, the tests are not available. 
On account of recent fluctuations in the cost of printing, prices are 
not stable. Accordingly, no prices are given. However, in only a 
few cases are the tests published on a commercial basis and for this 
reason one may be reasonably certain that the tests may be ob- 
tained at approximately the cost of printing. 

*S. A. Courtis, Thirds Fourth and Fifth Annual Accountings, 191S-1916. 
(Department of Cooperative Besearch, Detroit). Bead especially the warning 
given on page 53. . 
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Standardized Tests for Use in the Elekientary School 

I. Arithmetic, Fundamental Operations 

1. Bobbitt Arithmetic Tests. These tests were used in the 
survey of the public schools of San Antonio, Texas. They consist 
of nine tests, five for the operations with integers and four for the 
operations with common fractions. Each test is limited to one op- 
eration, but contains examples of several different types. However, 
they are arranged in groups which are equal in difficulty. (Ref. 
546.)* 

2. Boston Tests. Addition of Fractions. These are a series 
of six tests devised by A. W. Kallom and are significant for the il- 
lustration they furnish of tests based upon a scientific analysis of 
the abilities they measure. This analysis revealed fourteen types 
of examples in the addition of two fractions, but by making certain 
combinations the number of tests needed to measure this group of 
abilities was reduced to six. A similar series has been devised for 
subtraction of fractions, and it is planned to extend the work to 
multiplication and division. (Ref. 78.) 


Standakds: Boston Medians: Addition of Fractions 


Grade 

■ 


Test 1 

I Test 2 

1 Test 3 

! Test 4 

1 Test 5 

Test 6 

Speed 

Medians 

& BO 

Bs a 
ea 

S''5 

wS 

g=5 

o V 

•< S 

Speed 

Medians 

Accuracy 

Medians 

H 

d 

-d « 
S'-d 

Accuracy 

Medians 

Speed 

Medians 

Accuracy 

Medians 

BO 

d 

'S-S 

Is 

Accuracy 

Medians 

VI 

1205 

10 7 

79.6 

7.7 

1 65.6 

5.5 

41.9 

4 0 

69.5 

4.6 

51 0 1 

4 4 

4S.6 

VII 

1243 

16.5 

86.6 

10.1 

72.9 1 

7.3 

46.1 

5.3 

69.2 

6.3 

54.9 i 

5.7 

48.1 

VIII 

1130 

20.7 

88.2 

11.6 

1 74.4 1 

8.4 

47.4 

6.0 

67.8 

6.9 

52 4 1 

6.4 

46.5 


3. Cleveland Survey Tests. These were designed for use in 
the survey of the Cleveland Public Schools. They have been revised 
slightly and used in the surveys at Grand Rapids and St. Louis. 
The series consists of fifteen tests, including four in addition, two in 
subtraction, three in multiplication, four in division, and two in 
addition and subtraction of common fractions. The total working 
time is 22 minutes and the administration of the tests is simple. 
They furnish a more detailed analysis than can be secured by means 
of the Courtis Standard Research Tests, Series B. Address Charles 
H. Judd, School of Education, University of Chicago, Chicago, 
Illinois, or S. A. Courtis, 82 Eliot St, Detroit (Ref. 403.) 

*Beference numbers in this chapter refer to the numbers in the bibliog- 
raphy, Chapter XIII. 
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Aysbaqis ot Median Scobxs in 15 Asithkxtio Tests yob Obades 8-8. Oleyeland 
AND Gband Rapids. Numbeb or Examples Right 


I Grade 


Test 1 

Ill 1 

IV 1 

V 1 

VI 1 

VII 1 

VIII 

A \ 

13.4 \ 

n.i \ 

21.9 \ 

24.9 \ 

27.0 \ 

28.9 

B 

8.9 

12.8 ' 

16.8 ' 

19.5 ' 

21.1 

25.8 

0 

6.5 

11.7 

14.8 

16.8 

18.2 

19.9 

D 

6.3 

11-4 

15.0 

17.7 

20.8 

22.8 

S 

AS 


5.9 

6.7 

7.4 

8.0 

P 


4.5 

6.6 

7.7 

9.1 

10.6 

G 

2.0 

8.6 

6.1 

5.5 

6.0 

6.7 

H . :...... 

5.6 

6.0 i 

7.7 

8.6 

I 

0.6 

. . . 

1 A 

1.7 

8.1 

4.0 

4.7 

J 

1.9 


S.Q 

4.4 

6.1 

6.1 

K ' 

4.0 ' 

■■H 

7.0 

9.4 

11.4 

L 


1.7 


8.2 

3.8 

4.4 

M 

HI 

2.4 


4.1 

4.7 

5.4 

N j 

0.8 

n 

1.6 1 

1.9 

2.4 

0 


8.3 

4.8 

5.2 


4. Courtis Standard Tests, Series A, This series includes 
eight tests, one for the combinations (0-9) in each of the operations ; 
copying figures, speed reasoning, fundamental operations, and reas- 
oning. The series was devised in 1909 and was used extensively 
during the following years. However, the author has discontinued 
its publication in favor of Series B, devised in 1913. (Ref. 91.) 


Standabd Median Scobes, Coubtis' Standabd Reseabch Tests, Sebies B. 


Gbade 

Addition 

Subtraction 

Multiplication 

Division 

Speed 

Acc. 

Speed 

Acc. 

Speed 

Acc. 

Speed 

Acc. 

IV — General 

7.4 

64 

7.4 

80 

6.2 

67 

4.6 

57 

Courtis 

6 

100 

7 

100 

6 

100 

4 

100 

Boston 

8 

70 

7 

80 

6 

60 

4 

60 

V — General 

8.6 

70 

9.0 

83 

7.5 

75 

6.1 

77 

Courtis 

8 

100 

9 

100 

8 

100 

6 

100 

Boston 

9 

70 

9 

80 

7 

70 

6 

70 

VI— General 

9.8 

78 

10.3 

85 

9.1 

78 

8.2 

8 

Courtis 

10 

100 

11 

100 

9 

100 

8 

100 

Boston 

10 

1 

70 

10 

90 

9 

80 

8 

80 

VII — General 

1 

10.9 

75 

11.6 

86 

10.2 

80 

9.6 

90 

Courtis 

11 

100 

12 

100 

10 

100 

10 

100 

Boston 

11 

80 

11 

90 

10 

80 

10 

90 

VIII — General 

11.6 

76 

12.9 

87 

11.5 

81 

10.7 

91 

Courtis 

12 

100 

13 

100 

11 

100 

11 

100 

Boston 

12 

80 

1 12 

90 

11 

80 

1 11 

90 


Speed is the number of examples done in the time allowed. 

Accuracy is the percent of examples correct. 

‘‘General'* medians were determined by Courtis on the basis of the 1916 tabulations 
and BummariM of tabulations of other years. Courtis, S. A. Third, Fourth, and Fx/th An- 
nual Aeeountinffs, 1913-16. (Department of Cooperative Research, Detroit). 

The Boston standards were established after using the tests for three years. Ballou, 
F. W., Arithmetic, the Courtis Standard Tests in Boston. 1912-15. {Bfttletin No. 10 of tho 
Department of Educational Investigation and Measurement) 
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5. Courtis Standard Research Tests, Series B. This series of 
tests consists of one test in each of the four fundamental operations. 
The tests measure the speed and accuracy with which the pupil 
can perform these operations with one type of example. The ad- 
ministration is very simple. The total time required to give them is 
26 minutes. They have been used extensively since their first pub- 
lication in 1914. The measures have been proved reliable in 75 
to 90 percent of the cases. Address S. A. Courtis, 82 Eliot St., 
Detroit, Michigan. (Eefs. 77, 97.) 

6. Guhin^s Number Tests. These tests include 88 combina- 
tions for both addition and multiplication. The standards are given 
in terms of the number of seconds required to complete the test. 
Address Hubb City School Supply Company, Aberdeen, South Da- 
kota. 

Standards. 3rd grade, 150 6th grade, 120 

4th grade, 140 7th grade, 110 

5th grade, 130 8th grade, 100 

7. Monroe* s Diagnostic Tests. This series covers the four 
fundamental operations in integers, common fractions, and decimal 
fractions. It is thought that they will furnish a reasonably com- 
plete diagnosis of the abilities of pupils to perform the operations 
of arithmetic. Although the series consists of 21 tests, they have been 
so arranged that the total time required for giving them is only 
35^ minutes. Address Bureau of Educational Measurements and 
Standards, Emporia, Kansas. 

8. National Business Ability Tests. The tests for addition, 
subtraction and multiplication are abbreviated forms of the corres- 
ponding tests of the Courtis Standard Research Tests, Series B. In 
addition to these three, there are tests in multiplication of common 
fractions and in percentage. The standards are stated in terms of 
the number of minutes allowed for completing the respective tests. 
Address Sherwin Cody, Managing Director, 189 W. Madison St., 
Chicago, Illinois. 

9. Stone's Arithmetic Test for the Fundamental Operations. 
This test is of historical interest because it was used by Courtis 
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in the experimentation which resulted in the derivation of Series A. 
It was designed as a general test and has not been standardized ex- 
cept for the sixth grade. Address Bureau of Publications, Teach- 
ers College, Columbia University, New York City. (Ref. 138.) 

10. Thompson's Standardized Tests. This is an elaborate ser- 
ies of tests upon the operations of arithmetic. The feature of these 
tests is a mechanical device for scoring the papers. Address T. E. 
Thompson, Monrovia, California. 

11. Woody Arithmetic Scales. These consist of two series 
of four tests, one for each of the fundamental operations. They 
differ from such tests as the Courtis Standard Research Tests, Ser- 
ies B, in that the examples in each scale have been carefully graded 
and arranged in order of difficulty. In content they include inte- 
gers, decimal fractions, common fractions and denominate num- 
bers.* Series A and Series B are similar, except that Series A is 
more finely divided. Address Bureau of Publications, Teachers Col- 
lege, Columbia University, New York City. (Ref. 148.) 


Tsntativs Standabds <xr Aohievhhskt fob Woody Tests, Series A 


Grade 

Addition 

Subtraction 

Multiplication 

Division 

II 

8.12 

1.44 

• • • 

. . . 

Ill 

4.99 

2.96 

1.89 

2 54 

IV 

6.11 

4.22 

4.05 

8 21 

V 

6.99 

5.47 

5.52 

4 94 

VI 

7.95 

6.46 

6.72 

5.87 

VII 

8.65 

7.31 

7.26 

6.59 

VIII 

9.01 

7.64 

7.93 

7.16 


Tentative Standards of Achievement fob Woody Tests, Series B 


Grade | 

Addition 1 

Subtraction 

Multiplication 

Division 

ii 

4.5 

3 



Ill 

9 

6 

is 

8 

IV 

11 

8 

7 

5 


14 

10 

11 

7 


16 

12 1 

15 1 

10 


; 18 

13 i 

17 ! 

18 


1 18.5 

14 5 

18 

14 


TJecently there has appeared a modification of the Woody testa known as 
the Woody-McCall Mixed Fundamentals, Series B, I and II. These tests are 
more difficult than the original Woody tests. Each sheet has on it problems in 
all four of the fundamental operations, so that the pupil must choose the right 
operation for each problem. In exploring for tests best fitted for selecting 
gifted children in the 5th and 6th grades I have found the Woody-McCall 
Mixed Fundamentals distinctly better than the Woody tests from which they 
were derived. — G. M. W. 
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The standards are expressed in terms of the degree of difficulty 
of the examples that are done correctly by just 50 percent of the 
pupils. 

II. Arithmetic: Reasoning 

1. Bonser’s Reasoning Tests. These consist of two lists of 
ten problems each. The problems have been chosen so that the two 
tests are equal in difficulty. Address Bureau of Publications, 
Teachers College, Columbia University, New York City. (Ref. 75.) 

2. BuckingJiam’s Reasoning Tests. These tests were devised 
for use in the survey of the Gary and Prevoeational Schools of New 
York City. The problems of the tests were carefully evaluated and 
arranged so that the two lists are equally difficult, but were not sci- 
entifically selected. ( Refs. 82, 466. ) 

3. Courtis^ Reasoning Tests. Tests 7 and 8 of the Courtis 
Standard Research Tests, Series A (q. v.) are reasoning tests. 

4. Rice’s Reasoning Tests. These were given by Rice in 1902 
and are of historical interest. They consist of a series of tests, one 
for each of the grades from fourth to eighth, inclusive. The prob- 
lems were selected as suitable for the pupils of the respective 
grades. (Ref. 129.) 

5. Starch’s Arithmetical Scale A. This test consists of a 
series of arithmetical problems which are arranged in order of in- 
creasing difficulty. Address Daniel Starch, University of Wiscon- 
sin, Madison, Wisconsin. (Ref. 134.) 

The following are standard scores for the ends of the respective 
years, as derived from 2515 pupils in 18 schools : 


Grade Ill IV V VI VII VIII 

Score 4.5 6.2 7.8 9.4 11.0 12.6 


6. Stone’s Reasoning Test. This is a single test designed to 
be given to Grades IV to VIII, inclusive. The problems have been 
carefully evaluated. The test was used in the survey of the public 
schools of Butte, Montana, and Salt Lake City, Utah. Address 
Bureau of Publications, Teachers College, Columbia University, 
New York City. 
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STANDABDS. MbDIAN SCOBKfl BOB THB StONB BbASOKIKG TxST 


Grade 

Butte 

Bridgeport 

Salt Lake City 

18,495 WiBcontia 
Pupila 

V 

2.7 

6.1 

4.8 

2.4 

VI 

4.4 

5.2 

6.9 

8.9 

VII 

6.3 

6.8 

9.1 

5.4 

VIII 

8.2 

4.5 

11.0 

6.9 


Recently Stone has issued the following standards : 

‘‘That 80 percent or more of 5th-grade pupils reach or exceed a 
score of 5.5, with at least 75 percent accuracy; that 80 percent or 
more of 6th-grade pupils reach or exceed a score of 6.5, with at 
least 80 percent accuracy; that 80 percent or more of 7th-grade 
pupils reach or exceed a score of 7.5, with at least 85 percent accur- 
acy ; that 80 percent or more of 8th-grade pupils reach or exceed 
a score of 8.75, with at least 90 percent accuracy/' (Ref. 139.) 

III. Drawing 

f. Thorndike's Drawing Scale, This scale, devised by E. L. 
Thorndike, in 1913, consists of a series of drawings arranged in 
order of merit as determined by competent judges. Address 
Bureau of Publications, Teachers College, Columbia University, 
New York City. (Ref. 159.) 

IV. Geography 

1. The Boston Tests, The two tests of this series — one on the 
United States and the other on Europe — consist of well-chosen 
questions. The relative difficulty of the questions was determined 
upon the basis of the percent of correct answers. The tests were 
devised in an effort to determine: (1) the character of achieve- 
ment in geography and (2) the possibility of scientific measurement 
of educational results in geography. This significant comment is 
made: “The results show how inadequate the customary examina- 
tion or test in geography is to measure ability in geography." 
(Ref. 216.) 

2. Buckingham's Geography Test, This test was devised for 
use in the survey of the Gary and Prevocational Schools of New 
York City. It consist of two sets of 20 questions which were eval- 
uated upon the basis of the percent of correct responses. (Ref. 
466.) 

3. Hahn-Lackey Geography Scale. This scale consists of sev- 
eral hundred geographical questions which were found to be com- 
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mon to six modem texts and which satisfied certain other criteria. 
These questions have been classified according to difficulty. In 
appearance the scale is very much like the Ayres’ Spelling Scale 
and is to be used the same way. Address H. H. Hahn, Wayne State 
Normal School, Wayne, Nebraska. 

4. Starch's Geography Tests, Series A. The common elements 
of five geography texts have been arranged in five parallel tests. 
The exercises of the tests are in the form of mutilated sentences. 
Address Daniel Starch, University of Wisconsin, Madison, Wiscon- 
sin. 

5. Thompson's Standardized Tests in Geography. These con- 
sist of a test each for North and South America. They deal en- 
tirely with place geography. An important feature is a mechanical 
device for scoring the papers. Address T. E. Thompson, Monrovia, 
California. 

6. Witham's Standard Geography Tests. These are a series 
of tests arranged to test quickly and easily pupils’ knowledge of 
certain geographical facts. The facts for the tests on the world 
are grouped under these heads: (1) geographical divisions, (2) 
form and motion of the earth, (3) the hemispheres, (4) land and 
water forms, (5) homes of the races, (6) industries, and (7) largest 
cities. Address E. C. Witham, Southington, Conn. (Ref. 221.) 

V. Handwriting 

The scales described below are used to measure the quality of 
handwriting. The speed of handwriting is measured by having 
suitable material written under specified conditions for a definite 
number of minutes. In order that a measurement of speed may be 
most significant, it must be made when the quality of the pupil’s 
handwriting is approximately standard. 

‘ ‘ Pupils should be asked to write a suitable selection which they 
have memorized. To guard against lapses of memory, the pupils 
should be asked to repeat in concert the selection to be used. If 
convenient, it is well to provide each pupil with a printed or type- 
written copy of the selection. When this cannot be done, the se- 
lection may be written on the blackboard where all can see it. The 
selection should contain no words which the pupils cannot spell 
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readily. It is well to have them practice writing the more difficult 
words before the test is begun. Do not use material which the pu- 
pils must compose as they write, for this would be worthless in test- 
ing. The rate of writing unfamiliar material from a printed copy 
will vary with the pupils’ rate of reading and so will not give a 
true measure of speed. Dictated material should be used only 
when the teacher wishes to control the speed, not when speed is to be 
measured. 

^‘Different investigators have required pupils to write different 
material. Several have used the first line or the first stanza of the 
poem, ‘Mary had a little lamb.’ ‘Sing a Song of Sixpence’ has 
been used. Other sentences which have furnished copy are ‘Jolly 
kings bring gifts while happy maids dance.’ ‘A quick brown fox 
jumps over the lazy dog. ‘ Then the carelessly dressed gentleman 
stepped lightly into Warren’s carriage and held out a small card. 
John vanished behind the bushes and the carriage moved along 
down the driveway. In the Cleveland Survey the first three sen- 
tences of Lincoln’s Gettysburg Address were written, and Ayres 
has used the same selection in the ‘Gettysburg Edition’ of his scale. 
In several surveys the pupils were allowed to write any familiar 
stanza of a poem. The chief principles to bear in mind in selecting 
materials are : first, to use material in the lower grades which will 
not furnish difficulties in spelling and remembering; and second, 
to use material which will be uniform in all classes which are to 
be compared.”® 

1, Ayres^ Scale for Measuring the Handwriting of School 
Children, This is known as the “Three Slant Edition,” or more 
simply as the Ayres Scale. It consists of three types of specimens 
of the handwriting of school children — vertical, semi-slant and full 
slant — arranged in order of legibility as determined experimentally. 
The values 20, 30, 40, up to 90, have been assigned to the specimens. 
This scale has been used very widely. Address Russell Sage Foun- 
dation, New York City. (Ref. 227.) 

•This sentence was used in securing specimens for the Freeman Scale. It 
contains all the letters of the alphabet. 

•These sentences were used in securing the specimens for the Thorndike 
Scale. 

•Walter S. Monroe, Educational Testa and Measurements, p. 146. Houghton 
MifSin Company, 1917. 
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2. Ayres^ Scale for Measuring the Quality of Handwriting of 
Adults. This scale is similar to that described above, except that 
specimens of the handwriting of adults are used instead of the 
handwriting of school children. Address Bussell Sage Foundation, 
New York City. (Ref. 229.) 

3. Ayres^ Gettysburg Edition.** This scale differs from the 
other two in certain important characteristics. It consists of speci- 
mens of school children’s handwriting on ruled paper and there is 
only one specimen for each division of the scale instead of three 
representing different degrees of slant. The copy, the first three 
sentences of Lincoln’s Gettysburg Address y is the same for all spec- 
imens. The scale has printed on it standards for both speed and 
quality and complete directions for its use. Ayres asserts that the 
purpose of the new features is ‘‘to increase the reliability of meas- 
urements of handwriting. ’ ’ Address Bussell Sage Foundation, New 
York City. 

4. Breed and Downs* Scale. This scale was constructed in 
making a survey of the handwriting in the public schools of High- 
land Park, Michigan. The specimens were scored by means of the 
Thorndike Scale and then certain ones selected for a five-step scale 
for each of the following grades, 3d A, 3d B, 4th A, 5th A and 6th 
A. Thus, it differs from other scales in having a special scale for 
each of the grades named. (Ref. 234.) 

5. Courtis* Standard Research Tests, Handwriting, Series W. 
Test I, Handwriting, is an untimed “maximum performance” test, 
designed to secure samples of the children’s best writing after prac- 
tice. Test II, Filing Test, is a “free-choice” copying test, designed 
to secure samples of the children’s writing under working condi- 
tions. The test consists of the names and addresses of ten business 
firms, to be copied in alphabetical order. In both tests the quality 
of the writing is to be measured with the Ayres scale. The differ- 
ence in quality between the two samples reveals any “lack of trans- 
fer” from the work of the writing class to ordinary writing. The 
material in the Filing Test has been so chosen as to afford excellent 
material for an analysis of the defects in the writing of a particular 
child. Address S. A. Courtis, 82 Eliot Street, Detroit, Michigan. 
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6. Freeman* s Handwriting Scale. This scale differs from the 
others in that there is one scale for each of the following character- 
istics of handwriting: (1) uniformity of slant, (2) uniformity of 
alignment, (3) quality of line, (4) letter formation, and (5) spac- 
ing. Only three degrees of each characteristic 1, 3 and 5, are in- 
cluded in the scale, although the intermediate values, 2 and 4, may 
be used. This scale is designed for diagnosis rather than general 
measurement. Address Houghton MifBin Company. (Kef. 240.) 

7. Gray*s Score Card. This score card is of the same general 
character as those which are used in judging grain and livestock. 
It is based upon a determination of the important characteristics 
of handwriting. Its function is similar to that of the Freeman scale, 
t. e., to furnish a diagnosis rather than a general measurement. 
Address C. T. Gray, University of Texas, Austin, Texas. (Ref. 245.) 

8. Johnson and Stone* s Scale. This scale is similar in gen- 
eral plan to the Ayres and Thorndike Scales, but based on several 
factors, including movement and a detailed analysis of legibility. 
Each specimen of the scale is accompanied by a legend which states 
its defects and merits in terms of the analysis appended, which in- 
cludes seven factors — letter formation, uniformity of slant, uni- 
formity of alignment, spacing, quality of line, size, and degree of 
slant. (Ref. 247.) 

9. Thorndike* s Scale. This scale was constructed on the basis 
of three characteristics — ^beauty, legibility, and general merit. The 
degree of these characteristics represented in the specimens of the 
scale was determined by the concensus of opinion of competent 
judges. The numerical values of the specimens of the Thorndike 
Scale range from 4 to 18, and one or more specimens are given for 
each degree of quality. Address Bureau of Publications, Teachers 
College, Columbia University, New York City. (Ref. 263.) 

10. Zaner and Blossom Handwriting Scales. These are a 
series of scales for the several grades, designed to be used with a 
particular system of handwriting. Address Zaner and Blossom 
Co., Columbus, Ohio. 
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Stakdabd Median Scobss. Speed or Handwbiting 


Source 




Grades 




Approximate num- 
ber of specimens 
scored 

II 

III 1 

IV 1 

V 1 

VI 

vn 

VIII 

bleveland* 

89.2 



60 

70 

76 

80 

25.887 

Iowa Schoola'^ . . . 

89.2 

49.2 

61.9 

65.5 1 

72.6 

75 ; 

76.5 

28,000 

Starch’s Standard* i 

81 j 

38 1 

47 1 

67 1 

65 1 

76 1 

76.6 1 

4,740 

Kansas Medians* 

82 

35 

51 

61 

67 

71 

78 

6,000 

Fifty-six cities^ . , 

80.6 

43.8 

61.2 1 

69.1 

62.8 

67.9 1 

78 1 

84,000 

Freeman’s j 

Standards | 

86 

48 

I 56 1 

65 

72 

60 1 

90 1 


Standabd Median Scobes: Quality or HANDWBiriNa 


Source 

Grades 

Scale 

used 

Approxi- 
mate number 
of specimens 

scored 

II 

III 

IV 

mm 

VI 

VII 

VITT 

Cleveland 





45 

48 

50 

55 


25,287 

Iowa Scliools 

35.7 

39.8 

44.0 

49.1 

52.3 

57 

55 

Ayres 

281000 

Starch’s Standard .... 

27 

33 

37 

43 

47 

58 

67 

Ayres 

4,740 

Kansas Medians 

44 

47 

50 

55 

59 

64 

70 

Ayres 


Fifty-six cities 

39.7 

42 

45.8 

50.5 

54.5 

58.9 

62.8 

Ayres 

84,000 

Freeman’s Standards . . . 










(Ayres Scale) 

44 

47 

60 

55 

59 

64 

70 

Ayres 


Salt I^akft City^ . - ^ . 


9.2 

10.7 

11.1 

11.3 

12.2 

12.8 

Thorn- 

2,500 









dike 

Butte, Montana^ 

8.2 

8 

8.8 

8.9 

11.6 

11.2 

12.1 

Thorn- 

1,400 









dike 


Southingrton, Conn.^. . . . 






10 


Thorn- 

1,200 









dike 

Oonnersville, Ind.^* .... 

.... 


10 

10.3 

11.7 

11.7 

11 

Thorn- 










dike 













9.36 


10.18 

10.76 

111.34 111.89 

12.66 




VI. History 

1. Buckingham's Tests. These tests were used in the survey 
of the Gary and Prevocational Schools of New York City. They 
consist of two sets of questions which have been evaluated on the 
basis of the percent of correct answers. More recently Bucking- 

•Judd, Charles H., Measuring the Work of the Public Schools. Eeport, 
Survey Committee on the Cleveland Foundatiouj 1916. 

^Ashbaugh, E. J., Handwriting of Iowa School Children. University of 
lowOf Extension Division, Bulletin No, 15, March 1916. 

■Starch, D., The Measurement of Eflaciency in Beading, Writing, Spelling, 
and English. University of Wisconsin, 1914. 

•DeVoss, J C. Second Annual Eeport of Bureau of Educational Measure^ 
ments and Standa/rds. Kansas State Normal School, Emporia, Kansas. 

^■Freeman, F. N., Fourteenth YearhooTc of this Society, Fart I, 1915. See 
also the Sixteenth Yearbook, Part I, 1917, Ch. IV. 

^^Report of a Survey of the Schools of Salt Lalce City, Utah, (1915). 

^Report of a Survey of the Schools of Butte, Montana, Ch. IV (1914). 

“Witham, E. C., All the Elements of Handwriting Measured. Educational 
Administration and Supervision, 1; 1915, pp. 313-24. 

^Wilson, G. M., The Handwriting of School Children. Elementary School 
Teacher, 6 ; 1911, pp. 450-53. 
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ham has studied the relation between the ability to remember his- 
torical facts and the ability to use them. In this study specially 
devised tests were used. (Eef. 466.) 

2. The Bell and McCollum Test. This test consists of a series 

of questions which have been very carefully selected because of their 
importance. The topics included are : (1) dates-events, (2) men- 

events, (3) events-men, (4) historic terms, (5) political parties, 
(6) divisions of history and (7) map-study. The test can be ad- 
ministered in a forty-minute period. (Ref. 270.) 

3. Harlan^ s Test of Information in American History. This 
is a test of historical information based upon the study of Bagley 
and Rugg, ‘'The Content of American History Texts. ’’ Address 
Chas. L. Harlan, College of Education, University of Minnesota, 
Minneapolis, Minn. 

4. Starches American History Tests, Series A. This test is 
based upon the facts and principles common to five modern texts. 
The exercises are in the form of mutilated sentences. Four dupli- 
cate fonns are available. Address Daniel Starch, University of 
Wisconsin, Madison, Wis. 

VII. Language 

1. Breed and Frostic Scale. The compositions used by Breed 
and Frostic in deriving their scale were written by sixth-grade 
pupils under uniform conditions. A part of a story called The Pic- 
nic was read to the class, and they were given 20 minutes to com- 
plete it. The method of selecting compositions for the scale and 
determining scale values was similar to that employed by Hillegas. 
(Ref. 165.) 

2. Courtis Standard Tests in English. See Reading, below. 

3. Harvard-Newton Composition Scale. The Harvard-Newton 
Composition Scale consists of four separate scales, one for each form 
of discourse; argumentation, description, exposition, and narration. 
Each of the scales consists of six compositions written by eighth- 
grade pupils and arranged in order of merit as determined by the 
marks assigned by teachers, rating them as eighth-grade composi- 
tions. For each composition there is given a statement of the most 
significant merits and defects. Address Harvard University Press, 
Cambridge, Mass. (Ref. 161.) 
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Standards. Median Scores for Harvard-Newton Scale 


Grade 

Number of Composition 

Median Scores 

Vllb 

67 

60 

Vila 

72 

64 

Vlllb 

68 

66 

Villa 

61 

69 

IXb 

57 

68 

IX 

61 1 

68 


4. Hillegas Scale for the Measurement of ike Quality in Eng- 
lish Composition for Young People. This consists of ten composi- 
tions ranging from an artificial production, whose scale value is 
zero, to the tenth composition, whose scale value is 9.3. Three of 
the ten compositions are artificial productions, five were written by 
high-school pupils, and the remaining two by college freshmen. No 
two were written on the same topic and they vary greatly in length 
and type. Each degree of merit is represented by only one com- 
position. (Ref. 172.) 


Standards for the Hilleoas Scale 


Grade 

Salt Lake 
City 

Butte 

Trabue : 
Median 
Score 

Trabue : Score above 
which three-fourths of 
pupils should rank 

IV 

2.9 

2 34 

3.5 

3.0 

V 

3.1 

2.87 

4.0 

8.5 

'S”! 

3.8 

8.40 

5.0 

4.0 

VIT 

4.4 

3.75 

5.0 

4.5 

VIII 

5.4 

4.11 

5.5 

5.0 

IX 

• • • 

• .*• 

6 0 

5.5 

X 

• • • 

. ... 

6.5 

6.0 

XI 

• • • 

• ... 

6.9 

6.4 

XII 

. ... 


7.2 

6.7 


5. Nassau County Supplement. The Nassau County Supple- 
ment to the Hillegas Scale consists of nine compositions, seven of 
which were written by elementary-school pupils on the topic ‘‘What 
I should like to do next Saturday.'’ The compositions of the scale 
were carefully selected and evaluated by an elaborate method which 
cannot be even sketched here. Copies may be obtained from the 
Bureau of Publications, Teachers College, Columbia University, 
New York City. (Ref. 203.) 

6. Thorndike^ s Extension of the Hillegas Scale. This exten- 
sion is similar to the original scale, except that a larger number of 
compositions have been used, thereby making a more finely divided 
scale as well as providing several compositions for each degree of 
merit in the middle of the scale. Address Bureau of Publications, 
Teachers College, Columbia University, New York City. 
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7. TJie Trdbue Compleiion-Tesi Language Scales. Trabue 
has devised a series of Completion-Test Language Scales for the 
general measurement of language ability. Each scale consists of 
sentences from which one or more words have been omitted. The 
position of the omitted words is indicated by a blank. The pupil is 
to write in the missing words. The relative difficulty of the sen- 
tences has been carefully determined, and they have been arranged 
in order of difficulty. It is claimed for these tests that a pupil’s 
‘‘language ability” is very closely related to his score on these 
scales. Copies may be obtained from the Bureau of Publications, 
Teachers College, Columbia University, New York City. (Ref. 202.) 


Standajkds: Trabiti Oomplbtion-Tsst Scales 


Grade 

Median 

Grade 

Median 

II 

8.0 

VIII 

18.8 

Ill 

6.0 

rx 

14.2 

IV 

8.0 

X 

3 5.8 

V 

9.6 

XI 

15.8 

VI 

11.0 

XII 

16.2 

VII 





8. Willing *s Scale. Willing used compositions written by 
pupils in Grades four to eight on the topic “An Exciting Experi- 
ence.” Several particular exciting experiences were suggested, and 
20 minutes was allowed for writing. In determining the composi- 
tions to be used for the scale, “all errors in spelling, punctuation, 
capitalization and grammar were counted and corrected.” The 
compositions selected as samples for the scale were those which had 
the same rank in “story value” and frequency of errors. Ad- 
dress Bureau of Measurements and Standards, Emporia, Kansas. 
(Refs. 504, 513.) 

For the Denver Survey the following median scores were ob- 
tained : 


Grade 

BBKHEESESB 

WBSEMtM 

6th A I 

7th A 

8th A 

Median 



50.9 1 

60.2 

63.4 


9. BuckingJiam^s Grammar Test. In making the survey of the 
Gary and the Prevocational Schools of New York City, Bucking- 
ham used a series of questions upon English grammar. These ques- 
tions were carefully evaluated upon the basis of difficulty. They 
have been re-arranged and published by Haggerty. (See below.) 
(Ref. 466.) 
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10. Charters^ Grammar Test This test consists of sentences 
containing an incorrect form. The pupil is to write the sentence in 
the correct form and give the grammatical reason for doing so. The 
incorrect grammatical forms were selected from the errors occur- 
ring in the oral and written speech of pupils. Address W. W. Char- 
ters, University of Illinois, Urbana, Illinois. 

11. Haggerty^s Grammar Test. Same as Buckingham’s, with 
the questions re-arranged and printed in convenient form. Ad- 
dress Bureau of Cooperative Besearch, University of Minnesota, 
Minneapolis, Minnesota. 

12. National Business Ability Tests. These include two tests 
on grammatical form and one on punctuation. In the grammar 
tests the pupil is to choose between two forms which are given. Ad- 
dress Sherwin Cody, Managing Director, 189 West Madison St., 
Chicago, Illinois. 

13. Starches Grammatical Scales. Starch has devised three 
scales (A, B, and C) to measure a pupil’s ability to use correctly 
certain language forms. His Grammatical Scale A consists of a 
series of exercises arranged in order of increasing difficulty. As 
tentative standards of attainment Starch gives the following scores 
for the use of these scales (Ref. 214) : 


Grade VII VIII IX X XI XII Freshmen 

Score 8.0 8.3 8.6 8.9 9.2 9.6 10.3 


14. Starches Punctuation Scale. Starch has also devised a 
Punctuation Scale which is similar in form to the Grammatical 
Scales. The exercises consist of sentences to be punctuated. The 
following are tentative standard scores of attainment for the ends 
of the respective school years : 


i 

1 Grades | 


Hiirh 

School 


1 University 

Year 

7 

8 

1 1 

2 

8 

4 

1 10.8 

Score 

8.0 

8 3 

1 8.6 

8.9 

9.2 

9.5 


15. Starches Grammatical Tests. Starch has also devised three 
tests for measuring directly a pupil’s ability to recognize certain 
language forms. In Test 1 the pupil is asked to mark the part of 
speech of each word in a certain printed text. His score is the num- 
ber he designates correctly in three minutes. Test 2 calls for the 
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designation of the case of the nouns in another printed test. Test 3 
has to do with the tense and mode of verbs. 


Standards: Starch’s Grammatical Tests 



Grades 


High 

School 

1 

University 

Year 

7 

8 

1 

2 

3 

4 

1 3 

Score, Test 1 

80 

33 

36 

40 

43 

46 

I 60 

Score, Test 2 

13 

16 

20 

23 

26 

30 

45 

Score, Test 8 

18 

16 

20 

23 

26 

80 

1 45 


Address Daniel Starch, University of Wisconsin, Madison, 
Wisconsin. 

16. Thompson's Research Test in Grammar. This is a test 
of the pupil’s ability to indicate the part of speech in a list of words. 
The feature of the test is a mechanical device for scoring the papers. 
Address T. E. Thompson, Monrovia, California. 

17. Boston Copying Test. This test was devised to measure 
the ability of pupils to copy printed matter. In giving the test, 
each pupil was provided with a printed selection which he was asked 
to copy with pen and ink. In marking the papers the following er- 
rors were noted : in spelling, capitalization, punctuation, undotted 
i’s, uncrossed t’s; in omitting words, in adding words, in wrong 
words used, and in misplaced words. (Ref. 164.) 

The errors noted consisted of nine different kinds, and the num- 
ber of each kind made in this test by 4494 pupils is shown by the 
following tabulation : — 


Spelling 5,829 

Capitalization 644 

Omitted words 4,077 

Added words 606 

Wrong words used 840 

Misplaced words 105 

Punctuation 5,876 

Undotted i’s 8,794 

Uncrossed t’s 606 


Total 27,377 

Average errors per pupil 5.54 
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VIII. Music 

i. Seashore^s Musical Talent Chart. This chart is based upon 
the analysis of musical ability and offers a graphic means of rep- 
resenting the pupil’s musical ability. Address Carl E. Seashore, 
University of Iowa, Iowa City, Iowa. 

IX. Silent Reading 

1. Brownes Silent Reading Test. This test consists of a very 
interesting reading selection, which is used in Grades III to VIII. 
Duplicate selections of equivalent difficulty are obtainable. The 
directions require that the children read the selection silently for 
exactly one minute, then draw a line around the word which they 
have reached when the examiner calls ‘‘Stop.” The number of 
words read makes the score in speed. 

The children are then asked to write as much as they can re- 
member of what they have read. A key is provided for the exam- 
iner to use in scoring the papers. On it are listed all the separate 
ideas contained in the selection. By comparing the child’s papers 
with the key, the examiner determines how many different points 
there are in what the child read. Then his reproduction is exam- 
ined carefully to determine (1) quantity and (2) quality of com- 
prehension. Address, Bureau of Research, 25 Capitol St., Concord, 
New Hampshire. (Refs. 286, 287, 288.) 


Standards: Tentative Scobks With the Brown Silent Reading Test 



Words per Second 

Comprehension 

Reading Efficiency 

Grade III 

3 32 

46 

127.8 

Grade IV 

3 55 

65 

217.1 

Grade V 

4 40 

61 

291.0 

Grade VI 

4 54 

68 

295 0 

Grade VII 

4.65 

78 

322 3 

Grade VIII 

4.84 

79 

323.6 



2. Courtis standard Research Tests in English. These are a 
series of tests devised by Courtis to measure speed and comprehen- 
sion of silent reading. The series of tests was so complex that the 
marking of the test papers was a laborious task. For this reason 
the publication of the tests has been discontinued. (Ref. 291.) 

3. Courtis Research Tests Silent Reading (Series R, Test 2). 
This test is suitable for Grades I to VI. It measures a phase of read- 
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ing ability acquired in Grades II, III, and IV. In the first part of the 
test the children read a simple child ^s story under normal condi- 
tions. From this work a measure of the rate of reading is derived. 
In the second part of the test the paragraphs of the story are re- 
printed and under each are given five simple questions about the 
paragraphs. The questions may be answered by “yes’’ or “no” 
and they are designed to measure a child’s comprehension of the 
relation existing between the essential elements of the story. The 
tests are available in two forms of nearly equal diiBculty, so that 
measurements may be made at the beginning and end of the year. 
Address S. A. Courtis, 82 Eliot Street, Detroit, Michigan. (Ref. 
293.) 

OousTis Silent Readino: Series R. Test 2 
(Median Grade Scores at the End of the Tear) 


Grade 

II 

III 

IV 

V 

VI 

W’ords read per miniita 

84 

113 

145 

168 

191 

Qneations answered in 5 minutes 

.... 16 

24 

30 

37 

40 

Index of comprehension 

52 

78 

89 

03 

95 


4. Fordyce^s Scale for Measuring the Achievements in Read- 
ing, This scale consists of a selection to be read, upon which the 
pupils are required to answer certain questions that have been 
weighted for determining the comprehension score. The speed of 
reading is found by having the pupils mark the word they have 
reached at the end of the stated interval. In order that all may 
have the information necessary for answering the questions, the 
pupils are then directed to finish the story. Address the Univer- 
sity Publishing Co., Lincoln, Nebraska. 


Standards in Percents: Fordtcb Silent Reading Test 


Test No. 1, designed for Grades III, IV and V. 

Grade 

Speed 

Quality 

Ill 

90 

57 

IV 

05 

71 

V 

100 

74 

Test No. 2, designed for Grades VI, VII and VIII. 

Grade 

VI 

VII 

vm 

Speed 


100 

100 

Quality 

41 

45 

50 


5. Gray's Silent Reading Tests. These tests consist of three 
selections, one for Grades II and III, one for Grades IV, V, and VI, 
and another for Grades VII and VIII. The selections are so ar- 
ranged on the pages that the time required to read one hundred 
words can be readily ascertained. Only one child is tested at a time. 
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After completing the reading, the child, if in the second or third 
grade, tells the story to the examiner, who writes it down. In the 
grades above the third, the child writes all he can remember of the 
story, and then writes answers to a set of questions which is fur- 
nished him by the examiner. The child’s score for quality of read- 
ing is assigned on the basis of two factors, reproduction and ac- 
curacy. Reproduction is determined by the number of words which 
remain in the child’s composition, after all wrong or irrelevant 
statements and repetitions are stricken out. Accuracy is deter- 
mined on the basis of ten points for each correct answer. The 
quality mark is the average of these two. Address William S. Gray, 
School of Education, University of Chicago, Chicago, Illinois. 
(Refs. 297, 300a.) 


Staitdabd Scorxs tor Gjut'§ Silrnt Reading Tests 


Grade | 

ii 1 

III 

1 IV 1 

1 V 1 

1 VI 1 

1 VII 1 

1 VIII 

Rate (words per second) 

1.50 

2.30 

1 2.20 

2.57 

2.79 

2.69 

2.87 

Quality 

32 

37 

29 

82 

89 

22 

27 


6, Haggerty Visual Vocabulary Tests. The tests prepared 
by Haggerty, of the Bureau of Cooperative Research, School of 
Education, University of Minnesota, are but a slight modification 
of the Thorndike Visual Vocabulary Scales, with the addition of 
an oral test for children of Grades I and 11. This test will be de- 
scribed under the heading of ‘‘Oral Reading.” Scale R2, of which 
there is one sheet for children of Grades III and IV and another 
sheet containing part of the same words and additional more diffi- 
cult words for Grades V, VI, VII and VIII, is devised in exactly 
the same way as the Thorndike scales. Methods of scoring are 
somewhat more simple, and the lists are briefer than those used by 
Thorndike. Forms of equivalent difficulty are obtainable. Ad- 
dress Bureau of Cooperative Research, University of ]\Iinnesota, 
Minneapolis, Minnesota. (Ref. 301.) 

7. Kansas Silent Reading Tests. These tests were devised by 
P. J. Kelly. Both speed and comprehension of reading are com- 
bined in a single mark. These tests consist of graded lists of exer- 
cises which have been carefully evaluated. Each exercise consists 
of the directions for doing something, which is very simple after 
the pupil has fully understood the directions. His comprehension 
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of the exercise is measured by what he does. Test I is for Grades 
III, IV, and V ; Test II for Grades VI, VII and VIII ; Test III for 
Grades IX, X, XI, and XII. Address Bureau of Educational 
Measurements and Standards, Emporia, Kansas. (Refs. 314, 315.) 


Standards; Midian Soobes, Kansas Silent Reading Tests 
(Based upon more than 100,000 Scores) 


Grade 

HI 1 

IV 

V 1 

VI 1 

VII 1 

VIII 

IX 

X 

XI 

XII 

Twenty-five 











percentile . . 

2.5 

6.1 

9.4 

9.4 

11.8 

13.7 

16 0 

17.9 

18.7 

22.3 

M^ian Score 

5.8 I 

9.5 

13.2 1 

13.9 


19.2 

22.9 

25.6 

26.5 

29.7 

Seventy-five 





16.2 1 






percentile . . 

8.2 1 

1 13.6 

1 17.5 1 

19.8 

21.9 1 

2G.4 

1 30.4 

1 31.9 

33.1 1 

1 34.1 


8. The Minnesota Scale Beta. This is slightly modified form 
of Thorndike's Seale Alpha {q. v.) It is printed in a form which 
is more convenient for use. This scale is issued in two forms which 
are approximately equal in value. Address Bureau of Cooperative 
Research, University of Minnesota, Minneapolis, Minnesota. 

9. Monroe^s Standardized Tests in Silent Reading. In these 
tests those features of the Kansas Silent Reading Tests which have 
proved satisfactory have been incorporated. The exercises have 
been secured from school readers and other books which children 
read. Test I is for Grades III, IV and V ; Test II for Grades VI, 
VII, VIII ; Test III for Grades IX, X, XI, and XII. These tests 
are issued in three forms which are equivalent in value. Address 
Bureau of Educational Measurements and Standards, Emporia, 
Kansas. 

10. Starches Silent Reading Tests. These tests are similar to 
Brown's Silent Reading Tests described above, except that differ- 
ent selections are used for the different grades. Address Daniel 
Starch, University of Wisconsin, Madison, Wisconsin. 


Standards: Median Scores in Starch Silent Reading Tests 
(Attained at the Close of the Respective Years) 


Grade or Years 

1 I 1 

1 n 1 

1 in 1 

1 IV 

1 V 

1 VI 

1 VII 

1 VIII 

Speed ( words per second ) . . . 
Comprehension 

1 

1 1 

I 2.1 1 

1 

1 2.8 

1 3.2 j 

1 

1 3.6 

1 

1 4.0 

(words written) 1 

15 

' 20 1 

1 24 i 

[ 28 

1 83 

1 38 

1 45 

1 50 


11. Starches English Vocabulary Tests. These tests are lists 
of one hundred words, each selected at random from a dictionary. 
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The child is asked to check the words, the meaning of which he is 
certain, and to write the meaning after the words of which he is 
in doubt. The score is the percent of words thus checked or cor- 
rectly defined. These tests measure the extent of the pupiFs vocab- 
ulary, regardless of the value of the words. Address Daniel Starch, 
University of Wisconsin, Madison, Wis. 

The following are tentative standard scores for the various 
years as determined from tests made in four schools (Eef. 333) : 



Elementary 

Hiffh 

School 

University 

Years 

Scores 

4 5 6 7 8 

30 33 36 39 42 

1 2 

45 47.2 

3 4 

50 53 

12 3 4 

56 58.5 61 63 


12. Thorndike^ s Visual Vocabulary Scales. Thorndike is the 
author of three visual vocabulary scales: Scale A, Scale A2 and 
Scale B. The latter two represent extensions of the former, and 
were derived by the same method. Scale A2 and Scale B are in- 
tended for use alternately or interchangeably, and each is issued in 
three forms. Each scale consists of graded lists of words, the mean- 
ing of which the pupil is to indicate by assigning the words to 
certain classes. Address Bureau of Publications, Teachers Col- 
lege, Columbia University, New York City. 

Standards : No standards have as yet been derived by the use 
of the Thorndike Scale A2 or Scale B with large numbers of public 
school children. In the following table the standards of achieve- 
ment by the use of the Thorndike Scale A (with which the values 
on Scales A2 and B are supposed to be identical) are given, and 
serve as tentative standards for purposes of comparison. The score 
values were obtained by the measurement of the pupils in 18 cities 
in Indiana. (Refs. 336, 342.) 


Median Scores in Visual Vocabulary by the Thorndike Scale A 


Grades 

1 III 

i IV 

1 V 1 

1 VI 

1 VII 

1 vm 

Median Score 

4 00 

5.26 

6 00 1 

6 66 

7.29 

1 7.91 

Number of Children 

1650 

2095 

2028 1 

1860 

1625 

1 1313 


13. Thorndike’s Scale Alpha and Alpha 2 for Measuring the 
Understanding of Sentences. Scale Alpha 2 is a slightly more elabo- 
rate edition of Alpha. Each scale consists of a carefully graded 
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series of paragraphs. Each paragraph is followed by several ques- 
tions which the child is to answer as he reads the paragraph. The 
pupil’s answers to the questions determine the measure of his com- 
prehension. In the Teachers College Record which describes the 
derivation of this scale, there is given a score card for marking the 
answers to these questions. Address Bureau of Publications, 
Teachers College, Columbia University, New York City. (Refs. 
340, 343.) 

The following table gives the median scores for the pupils in 
18 cities in Indiana, as reported by Haggerty. 


Median Soorks in Undibstandino ot Sentences by the Thorndike Scale Alpha 


Grades 

. . .1 in 1 

1 IV 1 

V 1 

1 VI 1 

VII 1 

VIII 

Median 

Number of pupils .... 

. ..| 5.48 

...| 1850 

8.56 

2095 

7.56 

2028 

8.46 

1860 

8.72 

1625 

9.00 

1818 


X. Oral Reading 

1. Gray^s Oral Reading Test. This test consists of twelve para- 
graphs, arranged in order of increasing difficulty. The relative dif- 
ficulties have been established experimentally. The child’s oral 
reading of each paragraph is checked for time, and for each of six 
types of errors : gross errors, minor errors, omissions, substitutions, 
insertions, and repetitions. Address William S. Gray, School of 
Education, University of Chicago, Chicago, Illinois. (Ref. 297.) 

2. Haggerty's Visual Vocabulary Tests. For Grades I and II 
these consist of two sheets, one of eight words and the other of 
phonetic words selected from the Jones test. The words on either 
sheet are grouped into lists according to difficulty. This difficulty 
was determined by trial with several hundred primary children. A 
value is attached to each word according to its ascertained difficulty. 
The child is asked to pronounce the words aloud ; his score is the 
value attached to the most difficult list of which he can pronounce 
four out of five words correctly. Two equivalent forms are avail- 
able. Address Bureau of Cooperative Research, University of 
Minnesota, Minneapolis, Minnesota. (Ref. 301.) 

3. Jones' Visual Vocabulary Tests. Selecting ten of the most 
widely used primers, Jones found the frequency of occurrence in 
all the primers of each word occurring in any of them. He used 
this frequency as a measure of the value of each word. Using the 
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values thus determined for each word, lists of words were made up 
as tests. The score is the sum of the values attached to the words 
which the pupil can pronounce correctly. Address B. G. Jones, 
Cleveland, Ohio. (Bef. 308.) 

4. Pricers Oral Reading Tests. These consist of a series of 
suitable oral reading exercises for Grades II to VIII, inclusive. Two 
forms are available for each grade. Pupils are scored for words 
mispronounced, words inserted, words omitted, words transposed 
and number of words read. Address Supt. E. D. Price, Enid, 
Oklahoma. 

XI. Spelling 

1. Ayres* Spelling Scale. This scale consists of a list of the 
one thousand most frequently used words of the English language. 
These were determined by means of careful analysis of written 
material, ranging from friendship letters to some of our best prose. 
Later the words were classified according to frequency of mis- 
spelling for each of the several grades, and the percent of correct 
spellings for each grade was printed at the head of each list. These 
standards are for the words when used in dictated lists and without 
regard to whether the words have been taught in the respective 
grades or not. 

Strictly speaking, Ayres' Spelling Scale is not a scale or test, 
but a list of words from which tests can be made to measure the 
ability of pupils to spell the foundation words of the English lan- 
guage. The next four tests were constructed by using words from 
Ayres' list. Address Bussell Sage Foundation, New York City. 
(Bef. 352.) 

2. Courtis Standard Research Tests in Spelling. In these 
tests words chosen from suitable columns of Ayres' Scale are em- 
bedded in sentences, and the sentences are arranged so that they 
can be dictated at specified rates, which correspond to the rate of 
writing in the several grades. Each test includes 20 words. The 
standards set by Courtis are slightly lower than Ayres' Standards 
for the same words when dictated as isolated words. Address 
S. A. Courtis, 82 Eliot St, Detroit, Michigan. 

3. The Iowa Dictation Exercise and Spelling Tests. These 
tests, prepared by E. J. Ashbaugh, of the Extension Division, Uni- 
versity of Iowa, consist of twenty words embedded in sentences and 
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an equal number to be dictated separately. Each of the sentences 
has been constructed so that it is to be written in 30 seconds. Test I 
is for Grades III and IV, Test II is for Grades V and VI and 
Test III is for Grades VII and VIII. Address E. J. Ashbaugh, 
Iowa City, Iowa. 

4. The Nebraska Spelling Test, This test, prepared by Dean 
Chas. Fordyce, of the University of Nebraska, consists of twenty 
words embedded in sentences which are to be dictated at rates speci- 
fied for the several grades. The same words are used for all grades. 
Address Dean Chas. Fordyce, Lincoln, Nebraska. 

5. Monroe^ s Timed Sentence Spelling Tests, These tests dif- 
fer from the foregoing in that 50 words are used and no test words 
occur at the end of a sentence. This last feature protects the slow 
writer. Test I is for Grades III and IV, Test II is for Grades V 
and VI and Test III is for Grades VII and VIII and for the high 
school. The normal rate of writing was determined by measuring 
the speed of 6,000 Kansas school children. Address Bureau of 
Educational Measurements and Standards, Emporia, Kansas. 

6, Boston Minimum Spelling Lists, These consist of a list 
(for each grade) of ‘^commonly used but often misspelled words. 
These words have been standardized for the grades in which they 
are to be taught, and hence constitute lists from which words for 
testing may be selected. Address Department of Educational In- 
vestigation and Measurement, Boston, Mass. (Kef. 355.) 

7, Buckingham's Spelling Scale, Starting with a list of 
about 5,000 words common to at least two out of five spelling books, 
Buckingham by means of an elaborate statistical procedure, selected 
two lists of 25 words each. The purpose of the selection was to 
secure words which were easy enough in the third grade and hard 
enough in the eighth grade to afford a test in those and therefore 
intermediate grades, and which showed regular increases in percent 
correct from grade to grade.’’ The difficulty of each word was de- 
termined in terms of a common unit. Since the difficulty of each 
word is known, the entire list, or any desired portion of it, may be 
used as a test. Address Bureau of Publications, Teachers College, 
Columbia, University, New York City. (Ref. 358.) 



EXISTING TESTS AND STANDABDS 


97 


8. Jones^ Concrete Investigation of the Material of English 
Spelling. This bulletin presents the results of an investigation to 
determine ^‘what words, ‘grade for grade,’ do children use in their 
own free written speech, and which, therefore, they need to know 
how to spell. ’ ’ This list has been used as the basis for the construc- 
tion of tests. One such test has been devised by W. W. Phelan, 
University of Oklahoma, Norman, Okla., and used extensively in 
that state. For the Bulletin, address University of South Dakota, 
Aberdeen, South Dakota. (Kef. 370.) 

9. National Business Ability Tests. The elementary test con- 
sists of 50 words chosen from Ayres’ list of 542 obtained from the 
examination of two thousand letters. The advanced spelling test 
consists of a list of 50 words which are printed incorrectly. In 
ten minutes the pupil is to write the words correctly. Address 
Sherwin Cody, 189 West Madison St., Chicago, Illinois. 

10. Biceps Spelling Test. This test has a very great historical 
importance because it was Rice’s report on spelling at the meeting 
of the Department of Superintendence in 1897 that marks the begin- 
ning of the modern movement for scientific measurement in edu- 
cation. (Ref. 378.) 

11. Starches Spelling Scales. These scales have a function 
which differs from that of Ayres’ Scale or a test made from Ayres’ 
list. The latter measures how well pupils can spell the most com- 
monly used words of the English language, while Starch’s test 
measures the size of one’s spelling vocabulary. The tests consist 
of words selected at random from the non-technical words of the 
English language, with no regard to the frequency with which they 
are used. Address Daniel Starch, University of Wisconsin, Mad- 
ison, Wisconsin. 

Standards: Starch gives the following standards for his tests 
based on their use with over 2,500 pupils. 


Grade 

..!■ I 

1 n 

1 HI 1 

IV 1 

V 1 

VI 

1 VII 

1 VIII 

Percent of words 
spelled correctly . . . 

..| 10 1 

80 

1 40 

1 61 

1 61 


■1 

n 


These standards are interpreted thus : the average eighth-grade 
pupil should be able to spell correctly 85 percent of the non-tech- 
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nical words of the English language, or 85 of the 100 words in any 
one of Starch’s tests. (Ref. 384.) 

Tests for Use in the High School 

Certain tests of those described in the preceding pages are in- 
tended to be used in the high school, as well as in the elementary 
school. These are Test III of the Kansas Silent Reading Tests, the 
Thorndike Scale Alpha for Measuring the Understanding of Sen- 
tences, Starch’s English Vocabulary Test, certain composition 
scales, the copying test, the Trabue Completion-Test Language 
Scales, and Starch’s Grammatical Tests. For the description of 
these tests and the use of them, the reader is referred to the pre- 
ceding pages. 

In addition to these tests, many of the others have been applied 
to high-school pupils. For example, the Courtis Standard Research 
Tests in Arithmetic, Series B, have frequently been given to high- 
school pupils, although many of them were not studying arithmetic. 
However, in applying such tests to high-school pupils it should be 
remembered that the tests were not designed for that purpose, and 
it may be expected that they will not be as satisfactory as when used 
in the way intended. 

I. Algebra 

1. Coleman^s Scale for Testing Ability in Algebra. This test 
consists of a series of exercises arranged in order of difficulty. Ad- 
dress Supt. W. H. Coleman, Bertrand, Nebraska. 

2. Hotze^s First-Year Algebra Scales. These scales include 
tests on the following topics: (1) addition and subtraction, (2) 
multiplication and division, (3) equations and formulas, (4) graphs, 
(5) problems. They were restricted to these topics because the 
author felt ^Hhat the main business of the work in first-year algebra 
was to teach students how to solve typical algebra problems through 
the use of algebraic symbols.” Address Bureau of Publications, 
Teachers College, Columbia University, New York City. 

3. Indiana. Algebra Tests. Monroe’s Standard Research 
Tests in Algebra, described below, were incorporated in this series. 
The other six tests of the series were devised by H. 6. Childs, of the 
University of Indiana. (Ref. 58.) 
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4. Monroe's Standard Research Tests in Algebra. These con- 
sist of a series of six tests. Each of the first five tests is designed 
to measure the ability to do one of the operations occurring in the 
solution of simple equations. Address Bureau of Educational 
Measurements and Standards, Kansas State Normal School, Em- 
poria, Kansas. (Refs. 60, 61.) 


Standards: Median Scores for Monroe’s Standard Research Tests in Aloebr^i' 


Test 

1 I i 

II 

in 

1 IV 

V 

VI 

Number of pupils 

2077 

1993 

2107 

2127 

2198 

1992. 

Speed, number of ex- 







amples attempted 

14.6 

5.4 

11.5 

10.2 

11.2 

8.8 

Accuracy, peicent of ex- 







amples correct 

96 

41 

100 

94 

77 

82 


5. StromquisVs Preliminary Algebra Tests. This scries of 
tests includes tests upon the following operations: (1) addition, 
(2) subtraction, (3) multiplication, (4) division and (5) factoring. 
Address, University of Wyoming, Laramie, Wyoming. 

6. Rugg and Clark Standardized Tests in First-Year Algebra. 
This scries includes sixteen tests which are intended to measure all 
of the types of exercises in the work of the first year. Address H. 
0. Rugg, University of Chicago, Chicago, 111. (Ref. 66.) 


tentative standards for rugg & CLARK FIRST-YEAR ALGEBRA TESTS 
(Approximate Median Number of Problems Attempted per Minute 
for Each Test) 


Test 

1 2 

5 

6 

7 1 

8 

10 

11 I 

12 

13 

14 

15 

Most efficient school 

. 1 4.2 

14.51 6.3|13 4l 

4.61 3.01 4.21 

3.5 

1.1 

7.3| 1.4 

Average of upper third 
of 27 schools 

■ 1 3.5 

11.6 

5.B 

1 

12.51 

4.1 

1.5 

1 

3.21 

2.9 

1.0 

5.8 

2.5 

Ninth school 

.| 3.2110.7 

5.5| 11.51 

3.9 

1.2 

2.8| 

2.6 

0.8 

5.2 

3.9 

Average of 27 schools . . . 

. 1 3.0| 10.4 

4.91 ll'2i 

3.6 

1.1 

2.7| 

2.4 

0.8 

4.9 

3.1 

Poorest school 

.1 2.2 

6.5 

2.8 

7.91 

2.4 

0.51 1.51 

1.4 

0.6 

2.9| 


(Approximate Median Number of Problems Right.) 

Most efficient school .... 

.1 3.2 

13.2 

5.4| 11.81 

4.1 

1.4 

2.8 1 

2.8 

0.9 

5.3 

2.0 

Average of upper third 
of 27 schools 

.1 2.7 

11.0 

4.8 

io.tI 

3.5 

0.8 

J 

2.0 

0.7 

8.9 

4.2 

Ninth school 

.1 3.2 

10.0 

4.4 

9.0i 

2.9 

0.6 

1.2| 

1.7 

0.6 

3.4 

5.9 

Average of 27 schools . . . 

.1 2.2 

9.7 

3.8 

9.21 

2.9 

0.5 

1.11 

1.4 

0.6 

2.8 

7.9 

Poorest school 

.1 1.2 

6.1 

2.2 

4.31 

1.7 

0.21 0.1| 

0.5 

0.3 

0.1 



Standards for Tests 1, 3, 4, 9 and 16 will be sent to cooperating schools 
during 1917. 

Score for Test 15 is minutes required to solve one problem. 
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7. Thorndike^ s Algebra Test. This is a series of eight exer- 
cises arranged in order of increasing difficulty as determined by the 
opinion of competent judges. 

II. Drawing 

1. Rugg^s Scale for Measuring Freehand Lettering for Use in 
Secondary Schools and Colleges. It consists of a series of 8 samples 
of freehand lettering, arranged in the order of increasing merit. 
It may be used in measuring the efficiency of a student’s work in 
freehand lettering. Address H. 0. Rugg, School of Education, 
University of Chicago, Chicago, 111. (Ref. 156.) 

III. Foreign Language 

1. Brownes Connect ed-Latin Test. This test consists of a con- 
nected passage of Latin, to be interpreted in terms of its thought 
content. The pupils are given a specified amount of time in which 
to interpret and write in English as much of the passage as pos- 
sible. The translation is scored by means of a key. 

2. Brownes Latin-Sent ence Test, This consists of a series of 
Latin sentences ranging from very easy to very difficult. The sen- 
tences are graded and evaluated, and each is assigned a scale value. 

3. Brownes Formal Latin-Vocabulary Test. A list of fifty 
isolated words which have been graded and evaluated and a scale 
value assigned to each. The pupils are scored on their ability to 
give correct meanings for the words. 

4. Brownes Functional Latin-Vocabulary Test. A list of 
words in the Latin-Sentence Test. The pupils are scored on their 
ability to re-act to these words correctly in their functional rela- 
tionships in sentences. 

5. Brownes Formal Latin-Grammar Test. This test is made 
up of twenty constructions in Latin sentences. The constructions 
are in italics and the pupils are required to name and describe them, 
but not to translate the sentences. 

6. Brownes Functional Latin-Grammar Test. A series of 
Latin constructions chosen from the Latin-Sentence Test. The 
pupils are graded on their ability to react correctly to these con- 
structions in their normal settings. 
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With the foregoing six tests a survey was made of the work in 
Latin in New Hampshire secondary schools. The high schools of 
every city and every large town in the state were tested. Most of 
the private academies and seminaries were also reached. The re- 
sults are to he set forth in a document entitled A Study of Ability 
in Latin, This will consist of about 300 pages when printed and 
will be published by the Bureau of Educational Eesearch of the New 
Hampshire Department of Public Instruction, Concord. For fur- 
ther information address President H. A. Brown, State Normal 
School, Oshkosh, Wisconsin. (Ref. 277.) 

7, Hanus* Latin Tests, These consist of four tests for vocab- 
ulary, a translation test, and a grammar test. All of these tests 
are based on Caesar and Cicero. No words appear in the vocabu- 
lary tests * ‘ which occur less than one hundred times in Caesar and 
Cicero.” The translation test contains only constructions which 
are found at least five hundred times in Caesar and Cicero. ’ ’ The 
grammar test is based on the sentences to be translated. Address 
Paul Hanus, Harvard University, Cambridge, Massachusetts. (Ref. 
278.) 

8, Henman ^s Latin Tests, These consist of (1) an easy hun- 
dred-word vocabulary test — 50 in English and 50 in Latin — con- 
taining the words that are common to four widely used first-year 
books, (2) a standard vocabulary test of 239 words representing all 
the words common to 13 first-year books and to Caesar, Cicero, and 
Virgil, (3) a Latin-sentence test consisting of 30 sentences con- 
structed by using none but the 239 words of the Standard Vocabu- 
lary Test. Address V. A. C. Hennon, University of Wisconsin, 
Madison, Wis. 

9, Starches French Vocabulary and Reading Tests, The vo- 
cabulary test consists of 100 French words selected at random from 
a French dictionary. The English equivalents of these words are 
given on the test sheet, and the pupil is tested by means of the num- 
ber of English equivalents he can correctly associate with the 
French words. The reading test consists of simple sentences to 
be translated. 

10, Starches German Vocabulary and Reading Tests, These 
are similar to the tests for French. Copies of these tests may be ob- 
tained from Daniel Starch, University of Wisconsin, Madison, Wis- 
consin. 
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11. WhippWs German Vocabulary Test. This is a vocabu- 
lary test to measure the ability of graduate students to read scien- 
tific German in the field of educational psychology. A score of 60 
has seemed to indicate sufficient ability to read psychology and edu- 
cation in German. Address Guy M. Whipple, Carnegie Institute 
of Technology, Pittsburgh, Pa. 

IV. Geometry 

1. MinnicJc^s Geometry Tests. This series of tests is based on 
the assumption that the demonstration of a geometrical theorem 
involves the following abilities: (1) the ability to draw the figure, 
(2) the ability to state the hypothesis and conclusion, (3) the ability 
to recall facts concerning the figure, (4) the ability to select and 
organize facts so as to produce the proof. Address J. H. Minnick, 
University of Pennsylvania, Philadelphia, Pennsylvania. 

2. Rogers^ Mathematical Tests. These are a series of tests 
designed to measure several types of reasoning ability in the field 
of mathematics. The series includes tests on arithmetic and alge- 
bra as well as geometry. Address Bureau of Publications, Teachers 
College, Columbia University, New York City. 

3. Stockard and BclVs Geometry Test. This test consists of 
70 questions arranged in 20 groups. ‘ ‘ These groups involve draw- 
ing figures, naming figures, indicating order of development in dem- 
onstration, completing statements, stating of converse, definitions, 
regular polygons, parts of a demonstration, angular relations, area 
of trapezoid, angles in polygons, angles in circles, congruency of 
triangles, similarity of triangles, loci, auxiliary lines, simple con- 
structions, ratio and proportion, algebraic expression of geometrical 
relations, and equivalent construction. The questions are asked in 
such a way that many pupils are able to complete the list in forty 
minutes.'' (Ref. 223.) 

V. History 

1. SacketVs Scale in Ancient History. This scale consists 
of a series of eight tests, or exercises, the questions of which call for 
the essential information in the field of ancient history. The items 
of this information were determined upon the basis of a careful 
examination of an American history text and the judgment of 



EXISTING TESTS AND STANDABDS 


103 


experts in the field. Address L. W. Sackett, University of Texas, 
Austin, Texas. (Eef. 273.) 

VI. Physical Training 

1. Rapeer^s Scale for Measuring Physical Education. This 
is a score card for judging five aspects of the results of physical 
education, i.e., health, physiological efficiency, physical develop- 
ment, physical ability and mental qualities. Address L. W. Rapecr, 
San Juan, Porto Rico. 

VII. Physics 

1. Starches Tests in Physics. The tests consist of 75 sentences 
from which certain words have been omitted. The sentences and 
the words to be omitted have been so chosen that a pupil cannot 
supply the correct words unless he knows certain physical facts or 
principles. The facts, principles, and laws upon which these sen- 
tences are based were determined by examining five widely used 
textbooks. The 102 facts, principles, or laws which were treated by 
all five of the books are the ones which the pupils must know to do 
the tests correctly. Address Daniel Starch, University of Wiscon- 
sin, Madison, Wisconsin. 

Distributing Centers 

Information in regard to particular tests can always be ob- 
tained by writing directly to the authors. Very many of the tests 
and scales, however, can also be secured from various distributing 
centers, principally University Bureaus of Cooperative Research. 
For the convenience of those interested, the addresses of the six 
most important distributing agencies are given below: 

1. Bureau of Publications, Teachers College, Columbia Uni- 

versity, New York City. 

Tests by Thorndike, Stone, Hillegas, Trabue, Woody, 

Bonser and many others. 

2. Russell Sage Foundation, Division of Education, New 

York City. 

Ayres Writing and Spelling Scales. 
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3. S. A. Courtis, 82 Eliot Street, Detroit, ilichigan. 

Eesearch Tests in Arithmetic (Series A and B), Beading 

(Series R), Writing, (Series W), Spelling, (Series S). 

4. School of Education, University of Chicago, Chicago, Illi- 

nois. 

Gray’s Reading Tests, Cleveland Arithmetic Tests, Rugg’s 

Algebra Tests. 

5. University Supply Association, Madison, Wisconsin. 

Tests by Starch and Henmon. 

6. Bureau of Educational Tests and Standards, Kansas State 

Normal School, Emporia, Kansas. 

Tests by Monroe, Kansas Silent Beading Tests. 

7. Bureau of Cooperative Research, University of Minnesota, 

Minneapolis, Minnesota. Tests by Haggerty. 

Very often schoolmen who undertake measurement work with- 
out previous training meet difBculties in the scoring and tabulating 
of the results, or in interpreting them, which they are unable to 
overcome without assistance. Ordinarily, this assistance can be 
had for the asking from any worker in the field, but particularly 
from Departments of Education in State Universities. A letter, or 
better a personal visit, will often not only clear up in a few minutes 
a misunderstanding that otherwise might have led to failure and 
discouragement, but also bring into mutually helpful relations the 
so-called theoretical and practical educational forces of the state. 
The supreme service of educational testing is that it reveals prob- 
lems, stimulates attempts at solution, and affords measures of the 
success of the efforts made, but even the simplest educational prob- 
lems are so important, so complex, that they demand the united co- 
operative efforts of educational workers of every type. 



CHAPTER Vm 

RELATED FORMS OF EDUCATIONAL INVESTIGATION 


W. A. AVEEILL 

State Education Department, Albany, N. Y. 


The broadest interpretation of the title of this chapter includes 
everything in the whole school world which remains over and above 
the measurements of classroom work discussed in the other chapters 
of this Yearbook. It would call for the consideration of a formid- 
able array of topics of all sorts — from the dust content of school- 
room air to the nature and results of educational legislation — a 
range quite beyond the possibilities of this part of the Yearbook. 
Accordingly, the aim of this chapter will be two-fold; (1) to call 
the attention of superintendents to the different types of investiga- 
tions being made, and (2) where investigations are of a conven- 
tional type, as cost accounting, study of enrolment, etc., to indicate 
present tendencies, rather than to give a detailed statement of the 
entire field. 

Content of the Course of Study. Possibly the most significant 
type of investigation, as well as one immediately related to class- 
room measurement, is the analysis of the curriculum, both in ele- 
mentary and high schools, with special reference to the content of 
the courses of study and to the relative amount of time devoted to 
the different subjects. Ayres’ investigations in spelling have 
pointed the way to the elimination of waste in accomplishing given 
results and have shown which words among the thousands taught 
were proper tasks for the children of the various grades. Similarly, 
the investigations of Bagley and Bugg in history,^ Jessup in arith- 
metic, and others of the same type have served to indicate the meth- 
ods by which a scientific curriculum may eventually be constructed. 

PupUs* Marks and Scholarship Ratings. Another type of in- 
vestigation most intimately related to classroom measurement is the 


^For these and other references consult the bibliography, Chapter XIII. 

105 



106 


TEE SEVENTEENTH TEAEBOOK 


study of the marks given pupils by teachers. Two growing tenden- 
cies are worthy of mention ; first, superintendents are recording and 
studying the marks received by a class in order to discover what 
percentage of the pupils receive superior, satisfactory and poor 
ratings, and they are rating teachers, and making adjustments of 
work, on the basis of comparative percentages of failure; second, 
they are accumulating such data from year to year and subjecting 
them to careful statistical and graphical analysis. That is, school 
men are no longer content to hope and believe that their work is 
going well; they are attempting systematically to secure exact 
knowledge of the effects of their efforts by a scientific study of all 
available data* 

Rating the Teacher Efficiency, Immediately related to the 
measurement of the work of the pupils is the estimate of the teach- 
er ’s ability. The tendency appears to get away from general com- 
ment and to record specific details about the recitation itself. A 
rating card of the general type contains such questions as : 

Does the teacher show skill in habit formation? 

Does the teacher show skill in stimulating thought? 

Is the subject matter organized? 

Are proper habits of action developed both in and out of the 
classroom? 

Is the teacher interested in the life of the community? 

A rating card which goes into particulars contains such ques- 
tions as : 

Number of minutes lost in calling the class? 

Number of minutes lost in distributing material? 

Extent to which the recitation was confined to the text? Re- 
lated to the pupils’ lives and experiences? 

Number of questions which caused the pupils to think before 
answering? Number requiring merely yes-or-no answers? 

Another tendency of teacher rating takes the form of self-exam- 
ination, of which the following is an illustration : 

Clearing up pupils’ difficulties. 

1. What plan or method have I for discovering the diflScuIties 
which the class as a whole may have, and specific difficulties which 
individuals may have ? For providing the particular help a pupil 
needs to clear up his own difficulty? 
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2. When a large enrolment makes it practically impossible to 
provide the special help needed by individuals, what is my method 
of providing it! 

3. Do I appreciate that a pupil may lose a half year or a full 
year as the result of my failure to give him just the help he needs 
at the time he most needs it? 

4. Am I inclined to shift the responsibility for pupils’ failure 
to causes outside myself ? 

5. What evidence have I that I am successful in clearing up 
pupils’ difficulties? 

Still another important type of investigation connected with 
the rating of teachers, and one which is rapidly growing in favor, is 
the study of stenographic reports of actual recitations. 

Promotions and Non-Promotions. The question as to the num- 
ber of pupils promoted at the close of the term arises naturally 
after mention of the teacher’s efficiency. In the limited space of 
this chapter it is enough to call attention to the fact that superin- 
tendents are finding it desirable to record separately the following 
types of promotion : 

1. The mid-term promotion, where the pupil has spent about 
half the regular time in each of two grades. 

2. The double promotion, in which the pupil is made to skip 
an entire grade. 

3. The straight, earned promotion to the next higher grade. 

4. The un-earned promotion, meaning that the pupil has not 
passed,” but has been in one grade so many terms that there is 

no longer any benefit in keeping him there. 

5. The conditional or trial promotion. 

6. The failure, or non-promotion. 

7. The demotion to a lower grade. 

In comparisons of different cities, investigators should have 
these distinctions in mind, otherwise they may not be comparing 
the same things. In other words, the tendency in this field, also, 
is to define more sharply the analysis that is made. 

Teachers^ Reasons for Pupils^ Failures. The causes of non- 
promotion are a well-known subject of investigation and report, but 
often the investigation is limited to asking the teachers to write 
opposite the name of each non-promoted pupil a reason for the fail- 
ure. Under these conditions teachers will usually ascribe about 40 
percent of the failures to the pupils’ mentality and 20 percent to 
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the pupils’ lack of application. Here, too, the tendency is to push 
the analysis further and demand reasons based upon objective 
evidence. 

Elimination. Closely associated with the examination of fig- 
ures relating to promotions, is the consideration of the extent to 
which pupils have dropped out of school before promotion time. 
In the past it has been customary not to count as non-promoted any 
pupils who have dropped out before examination. This improves 
the showing, but in line with modem tendencies it is proving more 
valuable in getting at actual school facts to keep a record of the 
following types of withdrawal and to give them their full weight in 
any use which is made of promotion figures. 

1. Transfer to another part of the same public-school system. 

2. Transfer to schools without the local system. 

3. Removal from city with an implied continuance of 
schooling. 

4. Schooling stopped for the current term: 

a. Poor health. 

b. Bona fide cases of ‘^needed at home” 

c. Actual poverty. 

5. School given up as a bad job : 

d. Incapacity 

e. Indifference 

f. Disciplinary. 

Vocational Education. Early investigations of the causes of 
withdrawal from school, particularly those which recorded the state- 
ments of pupils and parents, as in the Minneapolis survey, brought 
out the fact that neither pupils nor parents regarded schooling be- 
yond the grades as of very material aid in gaining a livelihood. A 
few studies were made, correlating wages and salaries with time 
spent in school. Then the rapid growth of manual training, shop 
courses, and definite training for the industries, supplied material 
for a whole literature of vocational investigation and survey. This 
soon passed beyond the content of the curriculum to the industries 
themselves, culminating in the thoroughgoing survey of the Cleve- 
land type, in which the whole city is analyzed vocationally and the 
data, quantitative as well as qualitative, are practically and con- 
structively related to the schools and their program. It is interest- 
ing to note that in this connection, the first chapter of the Bloom- 
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ington, Indiana, survey report treats of the occupations of the 
women and the men employed in the city. 

Ages of Pupils and Progress Through School. It would be 
presumptious to re-state the principles of age and progress research 
so clearly outlined by Dr. Leonard P. Ayres in his Laggards in our 
Schools and The Identification of the Misfit Child. The nine cate- 
gories of pupils that result from the possible combinations of three 
age and three progress factors are appearing year by year in a 
larger number of superintendents’ reports. The tendency to get 
away from a liberal margin of years of normal age” which would 
enable any city to have a comfortable preponderance of “normal” 
pupils is in keeping with the demands of modern supervision and 
research. “To obtain for one city a record which will exceed the 
record of other cities” may serve the purpose of self-congratulation, 
but the more nearly correct picture of the situation which is ob- 
tained by using a single year age-limit for each half -grade (as is 
now done in New York State) is far more valuable, if less flattering, 
to the local superintendent. Present practice figures age at the 
time of beginning a given grade or completing it, rather than “be- 
ing in a grade” some time during the school year, and the prefer- 
ence is for the time of beginning. The actual statistics may be 
gathered any time during the early autumn. Bachman advances 
good reasons for figuring ages on August 31, the beginning of the 
official school year; the superintendents of New York State have 
chosen September 15 for the age date. The investigations of age, 
progress, elimination and retardation have become far too numerous 
to list separately in a chapter of this length. A new type of investi- 
gation is, however, to be seen in the occasional attention given in 
these studies to the exceptionally capable pupils who have been al- 
most forgotten in our efforts in behalf of their more unfortunate 
classmates. 

Cities and Villages Studied in Groups. For the investigation 
of enrolment by grade, elimination, age and progress through school, 
the New York State Education Department has divided the cities, 
villages and union high schools with elementary departments into 
groups based on the elementary enrolment. By means of mechan- 
ical tabulation, the data supplied by several hundred communities 
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throughout the state are quickly tabulated, analyzed, and returned 
to the systems which contributed the information. Eeturns will 
be made, it is believed, in time to be of use to the superintendents 
early in the next school term. Beginnings in this form of ‘educa- 
tional accounting’ have been made in the principal cities of the 
state and in 12 cities with an elementary enrolment 3,000 to 5,000 ; 
26 with 1,000 to 3,000 ; 44 villages with 500 to 1,000 ; 58 with 300 to 
500 ; 227 enrolling 100 to 300 elementary pupils and about 100 union 
schools with fewer than 100 elementary’' pupils. The superintendents 
of these school systems will be glad to exchange data with systems 
of like size throughout the country.^ 

Superintendents willingly receive comparisons of their systems 
with others of like size similarly situated. The first reaction toward 
this matter is an attempt to explain away the retardation and other 
defects ; it is only afterwards that the problem of doing away with 
it is attacked. Reducing the figures to common denominators of 
size, wealth, foreign elements and shifting population wins the con- 
fidence and cooperation of superintendents and principals and 
paves the way for constructive work. The mechanical tabulation of 
the returns also makes possible the correlation of progress-through- 
school with such factors as teachers’ salaries, principals’ salaries, 
number of pupils per teacher, per capita assessed valuation of the 
school district, number of hours devoted daily to supervision by the 
principal, and in small schools, the number of grades taught by one 
teacher. 

Investigations Giving Rise to Permanent Records, It is of in- 
terest to follow the usual course of many investigations of the type 
mentioned thus far in this chapter. Often the first stimulus to ac- 
tion is a questionnaire received by the superintendent ; next a sur- 
vey of classroom work is made with standard tests; then an age- 
progress survey follows, and the superintendent begins to incorpor- 
ate into his report some of the outcomes. Soon the giving of stand- 
ard tests and the statistical study of school records become a part 
of his regular work. Finally the items of information necessary 
for these investigations become the subjects of permanent record in 
the superintendent’s office and appear regularly in his report. 

^Address W. A. Averill, State Education Bldg., Albany, N. Y. 
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Groups of Related Investigations. There should be mentioned 
here quite an array of investigations, largely correlated with re- 
tardation, progress and the achievements of pupils as measured by 
standard classroom tests, which for want of space can only be listed. 

Studies having to do with 

1. Physical conditions of the children in school (hearing, 
eyesight, development, etc., medical inspection, school lunches, heat- 
ing, ventilation, school furniture, etc.). 

2. Home conditions (sleep, breakfast, play, home study, out- 
side work, etc.). 

3. Special classes (blind, deaf, mentally defective, tubercular, 
super-normal, special promotion plans, coaching rooms, supervised 
study, etc.). 

4. School organization (school programs, recesses, fatigue, 
vacations, division of school year, kindergarten, junior high 
schools) . 

5. Compulsory attendance (delinquency, discipline, truancy). 

6. Soci^ and moral welfare. 

Organization of City School Systems. Different in type though 
related are investigations dealing with the organization and 
administration of city school systems, as represented by the work 
of Cubberley, Strayer, Thorndike and others. Attention is centered 
on the functions of the different officers of the system, and the 
chief purpose is perhaps to determine the proper alignment of 
the functions of the school board and the superintendent. The 
conception of a school organization in which the superintendent 
supervises the educational work and a coordinate business officer 
manages the finances and school plant, is being abandoned in fa- 
vor of an organization in which the superintendent, as the chief 
executive officer of the board, is, next to the board itself, the head of 
the entire system, and the qne to whom all other officers, both busi- 
ness and educational, are subordinate. Investigations in educa- 
tional legislation show this same trend. Legislation recently passed 
in New York State gives every city a school organization on the 
plan just mentioned. 

School Buildings. An important series of investigations is 
that dealing with the subject of school buildings. The surveys 
made in Cleveland, Oakland and Milwaukee are typical. The 
grouping of types of school architecture by decades, as in the 
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Cleveland survey, is a significant modem tendency. Sanitation 
and hygiene naturally play a large part in these studies, as do 
ventilation, illumination and safety from fire. Belated to this 
group are occasional investigations of janitor service. 

Extension and Social Service. Much has also been said and 
written about the wider use of school buildings — their use at 
night and in vacations, school-board policies with reference to 
permitting various organizations to use school buildings and 
charging for their use. These surveys, however, lead into the 
field of community weKare and belong rather to the adult world. 

The Cost of Educational Work. Finally, no investigation of 
educational conditions or work can be considered complete which 
does not show the cost of obtaining the results achieved. The ear- 
lier investigations of cost were largely devoted to teachers’ salar- 
ies; indeed, for many years salaries and other expenses” were 
about all that could be culled from school accounts. Now, investi- 
gations of school cost fall into three categories: (1) those dealing 
with the classification of payments for school purposes, (2) those 
treating of the accounting procedure which gives the desired 
classification and (3) studies of comparative cost by items, schools 
and cities. 

The present tendencies may be stated as follows : 

The school budget is designed according to function, or the 
kind of work done or service rendered. The four main functions 
are (1) regulative and executive service, (2) property (acquisi- 
tion, construction, equipment, maintenance, and operation), (3) 
instructional, or supervision and teaching, and (4) extension and 
social service. 

School moneys are appropriated by main functions only, all 
details of expenditure are left to the board of education. 

The classification of expenditure involves four things about 
every payment made, namely, (1) the function subserved, (2) 
the character of the payment as a fiscal transaction, (3) the de- 
tailed object of the expenditure and (4) the location in the sys- 
tem to which it is chargeable. 

The accounting procedure itself has been simplified by aban- 
doning large unwieldy forms in favor of smaller loose-leaf sheets ; 
by using fiexible code symbols at the top of columns; by using 
voucher-checks and warrants in place of the old forms of voucher 
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jacket ; and by the standardization of the size of all blanks and 
cards to the new dimensions used commercially. 

Accounts have been rendered more valuable by including a 
register of orders, a register of accounts payable and an appropri- 
ation ledger showing at any time the unencumbered balance as 
well as the unexpended balance. 

Finally the information obtained is reduced to proper units 
for comparative analysis — per capita units, for instructional func- 
tions and square and cubic foot units for building, maintenance 
and operation functions. Graphic presentation is employed where 
ever possible. 

Modern Office Methods and Equipment. As a subject for pres- 
ent and future investigation, mention should be made of the equip- 
ment of superintendents’ and school board offices and ‘^modern bus- 
iness methods” of handling clerical work and routine. The ad- 
ding machine, the calculating machine and the slide-rule will en- 
able one or two clerks to perform a surprising amount of statis- 
tical work. The T-square and ruling pen are easily mastered 
tools for graphic presentation. Time for research is often to be 
gained by a reorganization of ofiSce routine and the elimination of 
unnecessary records and procedures. Finally, larger systems can 
solve many of these problems by resorting to the mechanical tab- 
ulation of statistics, both educational and financial. 

Conclusion. The breadth of the field presented by educational 
investigations as a whole has made their mention in this chapter 
disconnected and their discussion, when possible at all, so brief 
as to be perhaps inadequate. After all, we must revert to the sub- 
ject of the other chapters of this Yearbook — the measurement of 
the educational work itself — as that which is most important, be- 
cause it is most closely related to the purpose of the schools, the 
education of the children who attend them. Other types of in- 
vestigation should serve to throw all possible light on the local 
system in which given educational results appear. They should 
be of aid in effecting that interpretation of classroom work which 
will make for fair criticism and constructive suggestion. In fine, 
these related forms of educational investigation stand as an in- 
terpretative background to the more special measurements that 
mark the achievements of pupils in the schools. 



CHAPTER IX 

STATISTICAL TERMS AND METHODS 
B. E. BUCKINGHAM 

Educational Statistician, State Board of Education, Madison, Wisconsin. 


As educational reports multiply, it becomes increasingly evi- 
dent that there is undesirable variation in both their method and 
language. A standard procedure and terminology will be helpful, 
both to makers and to readers of reports. Moreover, the applica- 
tion to education of statistical methods developed in other fields 
needs to be made clear. It is the purpose of this paper to state 
what appears to be the best practice and to note the assumptions 
on which the practice is based. 

In certain sciences a higher degree of accuracy is required 
than is possible with a single measurement. Accordingly, many 
measurements are made of the same magnitude, and their average 
taken as the true amount. It was early ascertained that these 
measures showed a certain definite arrangement, in accordance 
with which values at or near the average of all the measures were 
of greatest frequence, and measures which differed from the av- 
erage were less frequent the greater their difference from it. Fur- 
ther, it was found that the number of measures greater than the 
average was likely to be equal to the number less than the av- 
erage by the same amount. The graphical representation of this 
kind of a series took the form which we have lately become famil- 
iar with as the ^normal' or ‘probability ' surface (see Figure 2). 
Measures which differed from the average were thought of as 
being in error, and this lead to the development of what has been 
called the “theory of error'' — a theory involving the symmetrical 
arrangement of measures of the same thing about the true meas- 
ure as represented by their average. 

The assumption was next made that there was an analogy 
between a series of measurements of the same thing, and a series 
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of single measurements of each of a number of individuals alike 
in some important characteristics. This analogy was found to 
work out very well for biological data. Economical data, on the 
other hand, do not appear to afford the same analogy, and since, 
prior to the recent application of statistics to education these 
methods have been used mainly by biologists and economists, 
statisticians were divided into two schools — the one adhering to 
the doctrine of the theory of error, the other rejecting it. The bi- 
ologists, broadly speaking, belong to the former and the econo- 
mists to the latter school. 

The evidence is strong that educational measurements tend 
to resemble those of biology in their structure, i. e., that the theory 
of error applies. Published results often fail to reveal this, but 
when such is the case, it is almost always due to scanty data, or 
to the selection of the children, or to the measuring instrument 
itself. 

It is to be understood that the validity of the statements in the 
succeeding pages of this chapter rests on the assumption that edu- 
cational measurements substantially conform to the arrangement 
which exists among many measurements of the same thing, i. e., 
that they conform somewhat closely to the theory of error, and 
exhibit approximately ‘normal distribution.^ 

A variable is a quantity which under the conditions imposed 
may assume different values throughout the discussion. In re- 
peated measurements of the same thing, the obtained measures 
differ through ‘error.' In education, where we make numerous 
single measurements of different pupils grouped together on the 
basis of some common characteristic, each different value con- 
stitutes a value of the variable. The number of problems solved 
correctly by seventeen girls in an upper eighth-grade class was as 
follows: 9, 3, 7, 8, 5, 6, 7, 6, 5, 7, 5, 6, 4, 8, 7, 4, 5. Each different 
number in this series is a value of the variable “performance of 
upper eighth-grade girls in solving specified problems in arith- 
metic." 

Attention must be given to the nature of the units in which 
the variable is expressed. They may indicate mid-points or lower 
limits, and they are influenced by the accuracy of the measurements 
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upon which they are based. In the Nassau County Supplement 
to the Hillegas scale, the samples have the following values : 0, 
1.1, 1.9, 2.8, 3.8, etc. If a person reports to the nearest scale value, 
he is using each value as a mid-point, and his rating of a compo- 
sition as 1.9 will mean that it is nearer to 1.9 than it is to 1.1 or to 
2.8 ; in other words, his 1.9 will mean from 1.5 to 2.35. If, how- 
ever, he rates a composition as 1.9 which has at least the merit of the 
scale sample at 1.9, but not as much merit as the sample at 2.8, 
he is using the scale-value as a lower limit, and his 1.9 will mean 
from 1.9 to 2.8. If he reports values between those given on the 
scale such that a composition slightly less meritorious than the one 
at 1.9 would be given a value of 1.8 or 1.7, and one slightly more 
meritorious than the one at 1.9 would be given a value of 2.0 or 2.1, 
then he is attempting to secure a greater degree of accuracy, and 
his 1.9 would mean 1.85 to 1.95. Thus, the same rating may have 
materially different meanings. It is necessary, therefore, that 
the units be accurately defined and that they be given the same 
meaning throughout a given discussion. 

A se7Hes in statistics is a grouping of the obtained measures 
of the variable by steps or classes. According to the nature of 
the units, series may be cither discrete or continuous. A discrete 
series is composed of separate integers. If we record the number 
of children in each class in a school system, all of our measures 
will be of this nature. For example, between classes of 20 children 
and classes of 21 children there can be no measures. A confiriious 
series, on the other hand, is one which is capable of an indefinite 
degree of sub-division. The height of children, their weight, the 
time they take to do a given task, their general intelligence, their 
ability in specific ways, all of these when measured yield continu- 
ous series. We may represent the height of a child as 60 inches, or 
his mental age as 10 years. In these cases, however, our measures 
are only short-hand expressions for ranges whose mid-point is 
the measure as reported. Persons reported as 60 inches in height 
may be anywhere between 59.5 and 60.5 inches. Most measure- 
ments in education yield continuous series. Even when series, 
taken at their face value, are discrete, such as the numbers of prob- 
lems solved, they should be treated as continuous series, because 
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in the abilities which they are supposed to indicate, no gaps occur. 
To say that in a certain class six pupils got 10 examples right, 
only indicates that they had an ability which permitted them to 
work 10 examples correctly, but not 11. Their abilities might be 
more accurately represented, if we had material with which to 
do it, by 10, 10.27, 10.5, 10.84, etc. 

Measures when first received are unorganized, and no definite 
impression may be obtained unless their number is very small. 
The first procedure is to divide them into classes. For example, 
700 measures were obtained from eighth-grade boys who wrote 
Test No. 15 of a series of ^'Progressive Spelling Tests. ’’ The test 
consisted of 50 words. Clearly 700 measures in a hap-hazard ar- 
rangement convey almost no information as to how well these 
boys performed. We may classify the measures, as in Table I, by 
indicating the number of boys who spelled no words correctly, one 
word correctly, two words correctly, etc. This table consists of 
51 classes. 


TABLE I 

Distbibutton of 8th-Grade Boys According to the Number of Words They 
Spelled Correctly “Progressivb Si’elling Tests,” No. 15 


No of 
Words 
Correct 

No. of 
Pupils 

No of 
Words 
Correct 

No. of 
Pupils 

No. of 
Words 
Correct 

No. of 
Pupils 

0 

7 

17 

15 

34 

22 

1 

14 

18 

20 

35 

15 

2 

9 

19 

19 

86 

16 

3 

6 

20 

18 

37 

12 

4 

11 

21 

25 

38 

20 

5 

13 

22 

20 

39 

13 

6 

18 

23 

20 

40 

11 

7 

11 

24 

14 

41 

14 

8 

11 

25 

15 

42 

7 

9 

13 

26 

20 

43 

9 

10 

9 

27 

23 

44 

1 5 

11 

13 

28 

22 

45 

2 

12 

8 

29 

22 

46 

8 

13 

17 

30 

25 

47 

1 4 

14 

12 

31 

19 

48 

2 

15 

19 

82 

24 

49 

1 1 

16 

14 

33 

18 

50 

i ® 





Total 

700 


Although Table I brings the measures into an orderly arrange- 
ment, it does not show the structure of the series as clearly as does 
Table II which is derived from it, and which exhibits but 10 classes. 
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TABLE n 

DiazTiD moK Tabu I 


No. of 

Word! 

€k)rrect 

No. of 
Pupils 

O- 6 

60 

6-10 

62 

11-15 

69 

16-20 

86 

21-25 

04 

26-80 

112 

81-85 

98 

86-40 

72 

41-45 

87 

46-60 

10 

ToUl 

700 


In this table each group of measures, e, g., 0 to 5, 6 to 10, etc., 
is called a class.^ For the Class 6 to 10, 6 is called the lower limit, 
and 10 the upper limit. The class interval is the difference between 
the upper and lower limits. In this instance the class interval is 
5 words. The number of measures within each class interval is 
called the frequency of the class, and its conventional symbol is 
‘‘f.’’ The entire tabular arrangement, consisting of classes and 
their frequencies, is called a frequency table or frequency distribu- 
tion. 

On the one hand, it is desirable in the interest of accuracy that 
the number of classes of a frequency table be large, in order that 
the intervals may be small. Obviously, it is much more accurate to 
say that 7 pupils spelled no words correctly, 14 spelled one word 
correctly, etc. (Table I), than to say that 60 pupils spelled from 
0 to 5 words correctly (Table II). On the other hand, it is desir- 
able that the number of classes be small enough to afford a gener- 
alized or typical result. A good rule to follow is : arrange the data 
in as many classes as will secure the greatest regularity of frequen- 
cies, i. e., in accordance with which the frequencies will as nearly 
as possible increase to a maximum near the middle of the range, 
and then decrease to a corresponding minimum. While the number 
of classes in Table I, according to this criterion, is too large, it 
might not be too large if we had five or six thousand measures. 

The graphic representation of the frequency table is called a 
frequency surface or frequency polygon. Figure 1 is a frequency 

*There is a slight irregularity in the class intervals in that the first 
class contains six units. 
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surface corresponding to Table II. Observe that tbe measures, 
grouped by classes, are arranged along the base line. The height of 
each column erected above a given class corresponds to the fre- 
quency of the measures for that class. These frequencies may be 
read from the vertical scale at the left of the figure. 

In Figure 1 the columns rise on either side of the tallest column 
with an approach to regularity. If, instead of 50 words, we had 
500, so that we could make 501 classes, and if the number of pupils, 
(i. e., of measures) were indefinitely increased, the perimeter of 
Figure 1 would approach the form of a smooth curve having a single 
peak near the center and sloping toward the base line to the right 
and left. 



Fiottbb 1. Frequency of Correct Spellings 

Horizontal distances represent numbers of words correct; Tertical dis- 
tances represent numbers of pupils. Data from Table XL 
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Figure 2. Normal Surface of Frequency 

The base line of such a curve is called the abscissa. Along it 
are represented the values of the variable. The perpendiculars 
from the curve of the abscissa are called ordinates. They represent 
the frequencies of the values of the variable. The longest perpen- 
dicular is called the maximum ordinate, and the measure repre- 
sented on the abscissa at the foot of it is the most frequent one. 
If a curve is of the symmetrical form shown in Figure 2, the maxi- 
mum ordinate divides the area between it and the abscissa into two 
equal parts. A curve of this form is variously called the Gaussian 
curve, the curve of error, the probability integral, or the normal 
curve of frequency. 

If a smooth curve were drawn to fit as closely as possible the 
frequency surface of Figure 1, this type form (Fig. 2) w^ould not be 
accurately reproduced. Spelling ability, however, among eighth- 
grade boys may, nevertheless, be distributed ‘normally.^ The pres- 
ent state of our knowledge does not permit us to be sure that the 
words selected would register all abilities among eighth-grade boys. 
Again, details, not here reproduced, indicate that the number of 
children was too small to bring out the real distribution. One small 
school furnished more than half the children who spelled from 0 
to 5 words correctly, and more than one-third of those who spelled 
0 to 15 words correctly. This unusually low performance causes 
the first three columns of Figure 1 to be unduly high. The rem- 
edy, of course, is to obtain returns from a great many more children. 

Although the arrangement of many measures in a frequency 
table and their graphic representation by a frequency surface per- 
mit the mind to grasp the significance of the data far more readily 
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than is possible without such devices, nevertheless further simpli- 
fication is generally desirable. The more numerous the measures 
and classes of measures, the greater is the need for concentrated in- 
formation — that is, for a single simple expression which shall con- 
tain in itself a summary of the whole series. This is the purpose 
of averages. 

The term average has both a general and a specific meaning. 
In its general sense it means any expression intended to give by a 
single figure, the general weight or typical measure of a series.^ 
In its restricted sense, it refers to the arithmetical mean, i. e., to 
the sum of the measures divided by the number of them. This is 
what we generally mean when we speak of the average. In edu- 
cational statistics, the term used to express the general concept of 
average is central tendency (C. T.). As has been stated above, 
series of educational measurements, when properly derived, tend to 
exemplify the theory of error, i. e. (to state the case very crudely), 
there is a part of the series where the measures are most numerous, 
while the other measures become fewer, the more they differ from 
the most frequent measure. Unless this condition is approximated, 
it is not appropriate to speak of a 'central tendency.^ We may, in- 
deed, make an arithmetical computation. The average of the an- 
nual enrolments of an institution which is steadily increasing in 
size may thus be found and will have its uses, but it will not be 
typical of any tendency in the measures. 

Since we are permitted, however, from the nature of educa- 
tional data to use the term 'measure of central tendency,’ it will 
be best to use the term 'average’ in its commonly accepted sense, 
as meaning the sum of the measures divided by the number of them. 
Some writers call this the "arithmetical mean,” or simply the 
"mean.” 

The three measures of central tendency (C. T.) most fre- 
quently used are the average, or arithmetical mean (A), the median 
(M), and the mode (Z). The median has been defined as the mid- 
most measure of a series whose measures are arranged in order of 
size, beginning with the smallest or the largest. According to this 
definition, the median is the ^^th measure (N being the number 

*See, for example, Zizek, Franz. Staiistioal Averages, 
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of measores). There are objections to this conception of the median, 
and recent workers have defined it as a point on the scale on each 
side of which half the measures lie when they are arranged in order 
of magnitude. The significance of this definition is that it em- 
phasizes the idea of a point on the scale rather than a measure. The 
measures in Table I are arranged in order of magnitude. Half the 
number of measures (pupils) is 350. From the beginning through 
Step 23, there are 342. Eight more measures are needed to com- 
plete the 350. These are all contained in Step 24, which may, there- 
fore, be called the “median step.” Since the series is to be re- 
garded as continuous, we must seek the M — ^the “point on the scale 
on each side of which half the measures lie” — somewhere in Step 24, 
t. e., between 24 and 25. We now assume that the 14 measures en- 
tered at “24” in the table are uniformly distributed between 24 
and 25. Since we need 8 of them, M will be 8-14 of the way from 
24 towards 25. Hence the M is 24-|-8-14 or 0.57 of the step. The 
step is 1 ; therefore the M is 24.57. If we count from the other end 
of the table, we obtain the same value, thus checking the compu- 
tation.® 

If the data were available only as in Table H, the M would be 
found to be 24.88, instead of the more correct 24.57, as computed 
from Table I. Thus, the greater regularity of Table II is gained 
at the usual sacrifice in accuracy. 

Series may have but single measures at each value, and there 
may be more or less wide gaps in the values. If the number of 
measures is odd, M is the middle one ; if even, it is the average of 
the two middle measures. For example, if the expenditures per 
pupil for 35 cities are listed in the order of their amounts, M is the 
expenditure of the 18th city. If there are 36 cities, M is the aver- 
age expenditures of the 18th and 19th cities. 

The median may be used when the items have been merely 
ranked, instead of measured. Thus a teacher may arrange the com- 
positions of a class in the order of their merit according to her 

If there were 701 measores made, let os sa}r, bj indoding one pupil who 
spelled 37 words correctly, we diould find the 350.5th measure. We should 
then need 8.5 of the 14 measures at Step 24, and M would be 24.61. 
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judgment.* The middle one (or the two middle ones, if there are 
an even number of them) is the typical performance of the class. 
Such typical samples may be compared with others similarly ob- 
tained at later dates, and thus a showing of the progress of the 
class may be made. To add to the definiteness of this, the median 
samples may be measured by a composition scale. 

It was pointed out above that the nature of the units in which 
measures are expressed must be clearly apprehended and con- 
sistently adhered to. Failure to do this introduces error in all 
the results of statistical analysis. An illustration of such an error 
made in the computation of medians may be found in the report 
of the Salt Lake City Survey. In the Table (p. 140) showing the 
distribution of composition scores by grades, failure to take ac- 
count of the nature of the unit resulted in reporting medians which 
were too small by an amount greater than the average annual im- 
provement from grade to grade. The teachers rated the composi- 
tions according to the original Hillegas scale. The values of the 
samples on the published scale were 0, 183, 260, 369, 474, 585, 675, 
772, 838, and 937. In reporting the ratings of the teachers, how- 
ever, these values were called respectively 0, 1, 2, 3, 4, etc., and in 
computing the median these latter values were the ones used, and 
they were called the mid-points of their respective class-intervals. 
Accordingly, when a teacher rated a composition as 3, (really 369) 
she was regarded as placing it between 2.5 and 3.5. As a matter 
of fact, she was placing it as nearer to 369 than to 260, or 474, and 
the class-interval was therefore from 314.5 to 421.5. The median 
for the 4th grade was found to lie four-tenths of the way into the 
class called “3” in the table, and, since the lower limit of this class 
was taken as 2.5 and the interval as 1, the median was reported as 
2.9. The lower limit, however, for the class was really 314.5 and 
the interval 107. This yields a median of 357.3 (314.5 107 X 0.4) 

or, shifting the decimal point for convenience and expressing the 
result correctly to one decimal place, 3.6, instead of 2.9 as reported. 

The mode is the most frequent value in a series. As such it is 

^This maj be done very accurately by making all possible comparisons of 
two papers and giving the better paper of eacb pair a preference mark. The 
paper having the greatest number of preference marks is the best paper; the 
one having ^ next greatest is the next best, etc. 
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the most evident measure of O. T. It is what most people think of 
when they speak of the average. A teacher, if asked her pupils’ 
average age, will probably reply by giving the most frequent age. 
The mode, therefore, is peculiarly dependent for its significance, 
and indeed for its existence, on the structure of a series. There 
may, indeed, be more than one mode in a series, but in educational 
work the presence of two or more modes indicates a probable fault 
in the procedure. 

We may distinguish the empirical mode and the theoretical 
mode. The former is obtained by inspection. It is taken as the 
mid-value of the class-interval containing the largest number of 
measures. It, therefore, depends on the size and position of the 
classes in the frequency table. This suggests that by varying the 
size and position of classes we may obtain different values of the 
mode and by combining them, arrive at a more accurate value of it 
than is possible with a single determination. This method is de- 
scribed in most of the textbooks.® 

The theoretical mode is not obtained directly from the data, 
but is the mode which would result from an indefinitely large num- 
ber of measurements sub-divided into very minute classes. Its most 
satisfactory determination is based upon fitting an ideal frequency 
curve to the actual series. The value of the variable corresponding 
to the maximum ordinate of the fitted curve is the theoretical mode. 
Its determination involves some rather advanced work in mathe- 
matics, and it is not being used in educational measurements. A 
substitute for this method has been suggested and is said to work 
well. According to it 

Mode = Average - 3 (Average - Median) 

The average for the series given in Table I is 23.91. The median, 
as shown above, is 24.57. The formula just cited yields 25.89 as 
the value of the mode. 

When the performance of pupils is recorded in the form of 
rates, e, g., of words read per second or letters written per minute, 
a measure of central tendency sometimes used is not the average 
of the rates, but their harmonic mean. This may be defined as the 

“See Bowley, Arthur L., The Elements of Statistics, London, 1907, pp. 
llSff.; also King, Wilford I., The Elements of Statistical Methods, "New York, 
1912, pp. 122-125. 



STATISTICAL TEEMS AND METHODS 


rzb 


reciprocal of the average of the reciprocals of the recorded meas- 
ures. Suppose five pupils read at the following rates per minute: 
A, 80 words ; B, 100 words ; C, 120 words ; D, 140 words ; and E, 
160 words.® The average of these rates is 120 words per minute. 
The reciprocal of each rate expresses the actual time required by 
each pupil to read one word. Thus, A required 1/80 min., or 
0.0125 min., to read one word; B^s time was 0.01 min.; C^s, 
0.00833 min. ; 0.00714 min. ; and E^s, 0.00625 min. The aver- 

age of these is 0.00884 min. It shows the average time required to 
read one word. The reciprocal of this average of reciprocals is 
1/0.00884, or 113 — the average number of words read per minute 
as computed by finding the harmonic mean. Note that we have 
found the average of the reciprocals of the rates and taken the 
reciprocal of the result. The average rate by this method is 113 
words, while the average by using the rates directly is 120 words. 
The harmonic mean is always less than the Average. It is, there- 
fore, clear, that results as found by different investigators are only 
comparable when computed by the same method. 

On practical grounds it is difficult to see why we need the har- 
monic mean as an alternative method in computing the central 
tendency of rates. There is no essential difference between rates 
per unit of time and certain economic measures such as wages per 
day. It is certainly not customary to express the central tendency 
of the wages of a group of workmen by using the harmonic mean. 
In the judgment of the writer, the introduction of this method in 
educational reporting serves no useful purpose. 

A measure of C. T., although indispensable, by no means suf- 
ficiently represents a series. It gives but one of the two chief char- 
acteristics of it. The second of these is the closeness with which the 
measures group about the C. T. The measure of this characteristic 
is called the measure of variability, or dispersion. There are three 
such measures in common use, and one of them should be used and 
reported, not only for its own sake, but also as a criticism of the 
measure of central tendency. All measures of variability, like those 
of C. T., are expressed in units of the series. For example, if the 
units of the series are words spelled, the C. T., as well as the varia- 

•Gray, William S., Studies of Elementary -School Beading Through Stand- 
ardized Tests, p. 15. 
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bility, will be expressed as a certain number of words. The aver- 
age deviation (A. D.) — also the mean deviation, or mean variation — 
is simply the average of the amount by which each of the measures 
differs from the average, median, or mode. It is being suggested 
by several writers that on theoretical grounds the A. D. should be 
computed from the median, because when so computed for any 
series, it is at a minimum. People in general do not sense the mean- 
ing of the A. D. or any other measure of variability as readily as 
they do a measure of central tendency. The A. D. may be thought 
of as the amount by which every measure of the series might differ 
from the C. T. without influencing it. 

The standard deviation (S. D. or a) is another measure of 
variability. It is found by squaring the difference between the in- 
dividual measures and the C. T., adding these squares, dividing by 
the number of measures, and extracting the square root of the quo- 
tient. The S. D. is theoretically the best measure of variability — 
at least when the series is 'normal' or nearly so. Its meaning, how- 
ever, is not apparent to the lay reader, and it is difScult to explain 
in terms of the series. If, in Figure 2, perpendiculars are drawn 
from the two points on either side of the maximum ordinate at 
which the curve changes from concave downward to concave up- 
ward, the distance along the base line from the foot of the maximum 
ordinate to the foot of either perpendicular represents the S. D. 
Between these perpendiculars a little more than two-thirds of the 
area of the curve is included. In series, therefore, which approxi- 
mate the normal type, we may expect about two-thirds of the meas- 
ures to differ from the C. T. by not more than the standard devia- 
tion. The S. D. is at a minimum when computed from the average ; 
and according to the best practice, it is therefore taken from that 
measure rather than from the median or mode. 

The quartUe deviation (Q) is half the range within which the 
middle half of the measures lie. In its computation one finds the 
three points in the range which divide the number of measures into 
four equal parts. These points are found in exactly the same way 
as the median. The points on the range at which these divisions 
fall are called the quartiles. The first, or lower one, is called the 
lower quartile, or 25-percentile (Qj) ; the third or highest one is 
called the upper quartile, or 75-percentile (Qg). The second one, 
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of course, divides the measures into the two equal parts and is the 
median ; it is not, therefore, referred to as a quartile. The differ- 
ence between the upper and lower quartile is the range within which 
half the measures lie. This difference divided by 2 is Q. In the case 
of normal distributions, it is called the probable error (P. B.). 

It is important that the student of educational statistics should 
understand not only that A. D., S. D., and Q are measures of varia- 
bility, but also that they are themselves important units of amount, 
and that for some purposes they replace the units in which the 
measures were recorded. Suppose that a group of eighth-grade 
pupils is tested in arithmetic and in handwriting. If a certain 
pupil scores 17 examples correct in arithmetic, and Quality 75 in 
handwriting, according to the Ayres scale, it is impossible to say 
how much the one performance is better than the other, unless both 
performances are expressed as a certain number of A. D ’s, S. D ’s, 
or Q’s above or below the median of the group. Either of these 
units will serve to express the measures of a series in such a 
way as to permit comparison with the measures of another series 
expressed in the same units of variability, even though the original 
units of the two series are different. 

If it is desired to compare the variabilities of two different 
series, we can hardly do so by using either of the three measures 
described above. If, for example, a group of children is rated in 
composition by the Nassau County Supplement to the Hillegas 
Scale, and in spelling by means of a list of 100 words, since the 
possible range of achievement in composition is only from 0 to 9, 
while in spelling it is from 0 to 100, the variability in the latter 
case will appear to be much greater than in the former. Under such 
circumstances, it is customary to divide the measure of variability 
by the C. T., the result being the so-called coefficient of variation. 
Generally speaking, it is an error in method to compare the meas- 
ures of variability of two series unless the series are expressed in 
the same units and have approximately the same C. T. 

Although the analysis of series may be carried to higher de- 
grees of refinement, educational workers report few measures other 
than those of central tendency and variability. No small space, 
however, in educational literature is devoted to the determination of 
the degree of correlation, or mutual implication, existing between 
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paired measures of different traits or performances. Thus, if 100 
high-school children are rated in Latin and in algebra, the extent 
to which an individual having high, medium, or low ratings in one 
subject tends also to have high, medium, or low ratings in the other, 
indicates the correlation for the group in question between ratings 
in Latin and ratings in algebra. Various methods have been de- 
vised for giving numerical expression to this correlation between 
two series of paired measures. Two of the methods suggested by 
Spearman — the ‘"Rank Difference’^ and the “Footrule” — depend 
on the extent to which the individuals ranking first, second, third, 
etc., in one series tend also to rank in the same order in the other. 
Such a correspondence in ranking would indicate perfect correla- 
tion, and would be expressed by the integer 1. By another method 
(suggested by Sheppard) each measure in the two series is given a 
sign plus or minus, according as it is greater than or less than the 
central tendency, and the number of times a plus or a minus sign 
in one series goes with the same sign in the other series for the same 
individual is noted. The greater the proportion of “like-signed 
pairs,” the greater is the correlation. The formula which is used 
in connection with this method again yields “1” as the measure of 
perfect correlation, i. e., as the measure obtained when all the pairs 
of signs are alike. 

Still other methods are used, but the so-called product-moment 
method is the one most commonly accepted. The meaning of the 
formula used in applying this method cannot be made clear in a 
brief description. The reader is referred to any textbook on statis- 
tics for a treatment of it. The measure of correlation yielded by 
this method is called the correlation coefficient y and its symbol is 
“r.” To such an extent is the product-moment method, the stand- 
ard in statistical work that coefficients derived by other methods 
are usually converted into it. 

In determining the degree of correlation which subsists in a 
given group with respect to two measurable characteristics, it must 
be emphasized that the individual measures, pair by pair, are to 
be used. It is not correct procedure to break the group up into 
smaller sub-groups and to compare the averages in the two charac- 
teristics for each group. Averages obscure individual variations 
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and it is precisely the individual variations which are important in 
securing a correlation coefScient. 

The correlation coefficient as an expression of relationship has 
been uncritically adopted. It is a measure of mutual implication. 
It fails to indicate the extent to which either of the measured char- 
acteristics depends upon the other. It may, however, be used in 
finding other expressions which do so. Thus, statisticians, while 
becoming more critical of the correlation coefficient, are using it to 
determine the two so-called regression coefficients, which permit a 
statement of the change in one characteristic likely to accompany 
a unit change in the other. I found, for example, that the correla- 
tion coefficient between achievement in answering thought and mem- 
ory questions in history was +0.40. This was not nearly so inform- 
ing as were the regression coefficients, one of which showed that suc- 
cess in answering thought questions accompanied success in an- 
swering memory questions to the extent of nearly 0.90 — being 
the maximum. The other regression coefficient, i. e,, of ‘memory' 
on ‘thought,' was less than 0.20. 

Most of the foregoing statements, and many others not made 
here, depend to a large degree for their validity upon the existence 
of ‘noimal' distributions. Although the ability of school children 
in a given grade is presumably close to normal, measurements ob- 
tained from test material often fail to support the presumption. 
Aside from the fact that the pupils ‘measured' are often relatively 
few and poorly selected, and that results are frequently scored by 
interested or indifferent persons, there is another important reason 
for this — a reason which has to do with the test itself as a measuring 
device. 

It is necessary, in the first place, to understand clearly what it 
is that we are measuring when we test school children. Mr. Courtis 
has pointed out a useful distinction between capacity, ability, and 
performance.*^ Capacity, as he defines it, represents the possibility 
of development — ^the natural endowment of the individual. It is 
his potential ability, and is independent of training. The ability 
of an individual is defined as the power actually developed by the 
effect of training upon inherent capacity. Performance is the spe- 

^Courtis, S. A., Third, Fourth, and Fifth Annual Accountings, 191S-1916, 
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cific achievement of an individual resulting from his ability as 
evinced under the conditions of the test. 

Although we may have good reason for supposing that the 
distribution of school ability is closely similar to the normal distri- 
bution, we see in the distinction which has just been made, that what 
we are really measuring is not ability, but performance, and that 
while ability may be constant at any particular time, performance 
will vary according to the conditions under which the ability mani- 
fests itself. This constitutes a real difficulty in educational meas- 
urements, especially when conclusions are drawn concerning in- 
dividuals rather than groups of individuals. It has been proposed 
that as many as twenty-five tests of a single individual ought to be 
made and the average of the results used before we can be reasonably 
certain that we have a reliable measure of performance. This is 
an extreme point of view and seems hardly to be tenable. Courtis 
has shown that if children are given a second test, only about 20 
percent of them will show marked differences in performance. He 
has also shown that giving additional tests does not materially 
change the score for a class as a whole. 

Since, however, what we are measuring is performance rather 
than ability, our tests must be so constructed that performance will 
tend to vary with ability, i. e., to be an index of it. Thus, the con- 
formity of the resulting measures to a normal distribution furnishes 
a criterion for judgment as to the adequacy of the test material. 
Assuming that ability is distributed approximately normally, it 
is highly desirable that the test material should be such as to bring 
out that form of distribution. Too often this is not the case. On 
this ground a great deal of test material now being used in edu- 
cation is faulty. For example, most of the testing which has been 
done in spelling by using Dr. Ayres’ list has resulted in a large 
number of perfect papers, or papers that were nearly perfect. This 
is because the examiner has chosen words which Dr. Ayres found 
possessed little difficulty for the grade in question. Recently, re- 
sults were published from the use of words, each of which, accord- 
ing to Dr. Ayres’ list, had been spelled correctly by 88 percent of 
the children. Naturally, a very great number of the children 
spelled all the words correctly. Under such circumstances, no meas- 
ure of central tendency could be regarded as significant. In the 



STATISTICAL TEEMS AND METHODS 


131 


fourth grade, 15 percent of the children wrote perfect papers. More 
of them spelled all the words right than spelled any other number of 
them. The test material was too easy to register variation in per- 
formance among the more capable children and therefore failed to 
afford any index of their ability. It is certain that among the chil- 
dren who spelled all the words correctly there was a wide difference 
in ability which would have caused a correspondingly wide differ- 
ence in performance, if the material had been capable of showing it. 
The same inadequacy of test material is revealed when it is too diffi- 
cult for the pupils to whom it is presented. Under such circum- 
stances a large number of children of varying degrees of ability 
will be unable to register any performance in terms of the test. 
For example, in a certain test in arithmetic 145 children out of 943 
were reported as unable to solve a single problem. Among these 
children large differences in performance corresponding to large 
differences in ability would have been indicated by a different test, 
i. e,, by one easy enough to enable them to accomplish something. 

The best test material will be found to be midway between that 
which, through being too easy, fails to register variation among the 
pupils of greatest ability, and that which, through being too diffi- 
cult, fails to register variation among those of least ability. A 
spelling test, for example, composed of words, each of which may be 
expected to be spelled correctly by 50 percent of the children in the 
grade taking the test, will afford ideal results. In general, test 
material is most adequate which yields series of measures with an 
approximately normal distribution. Any piling up of measures 
at either the high end or the low end of the range is a defect. Such 
material is impossible when given to pupils in several successive 
grades. In the writer’s judgment, no test is satisfactory when used 
in grades more than two years apart. 

Various types of tests are being used to measure educational 
products in terms of the performance of children. Dr. Ayres has 
distinguished three. First, there are those in which the material 
is arranged by steps of increasing difficulty from very easy to very 
hard, such as Gray’s Oral Reading Scale, and Woody’s Arithmetic 
Scale. These may be called difficulty tests. They measure “how 
hard.” Second, we have accuracy tests, or those which measure 
“how well.” Examples of these tests are the various reading tests 
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whicli measure the quality of reading by the accuracy of the re- 
production of it. Third, we have tests in which the material is in- 
tended to be of the same difficulty throughout, and in which the 
pupil is required to do as much work as he can in a given time. 
This kind of test may properly be called a speed or rate test. It 
measures “how much.’’ Doubtless we should measure performance 
by all three of these devices. Ultimately we shall have at each point 
of difficulty as set up in the first type of tests, a large number of 
elements — words, problems, passages to read — each of the same dif- 
ficulty. The arithmetic tests used in the Cleveland Survey are an 
approach to this. The criticism of the difficulty tests which have 
a single element at each step is that the difficulties of the elements 
are of different kinds. The criticism of the accuracy and rate tests 
is the varying difficulty of their elements. If, however, we have a 
large number of elements at each level of difficulty, we may then 
apply the kind of measurement used in the second and third types 
of tests, and, within the difficulty in question, we may measure both 
“how well” and “how much.” Under such circumstances, our 
record of performance would, so to speak, consist of three dimen- 
sions. It would also afford a better index of ahUity. 

Meanwhile, many new tests are needed, as well as much re- 
finement of the material of tests now in use. It is unnecessary at 
this time to decide the relative merits of difficulty, accuracy, and 
rate tests. At present, the best criterion to apply to these types 
of tests is that of the regularity of the series of measures derived 
from them. If we are warranted, as it seems we are, in inferring 
that ability is distributed with an approximation to normality, then 
the highest requirement on the part of a test is that it measure per- 
formance in such a way as to be a true index of ability. The con- 
ditions under which tests are administered and the methods by 
which they are rated must be rigorously controlled, but the first 
requirement is that the means of measurement should be sensitive 
instruments, capable of registering variation in the things which 
they presume to measure. We may labor ingeniously at our anal- 
yses of results and may bring from afar the most potent methods 
which statistical theory has evolved, but we shall accomplish little 
if our instruments are as grossly defective as some of those which 
are now being employed appear to be. 



CHAPTER X 

TRAINING COURSES IN EDUCATIONAL MEASUREMENT 


S. A. COURTIS 

Supervisor of Educational Hesearch, Detroit Public Schools 

The rapid growth of the movement for measurement and the 
importance of the knowledge gained from surveys and experimental 
studies have led to a demand for definite training in measurement 
work. From superintendent to teachers, educational workers have 
been quick to recognize both the possibilities of the new tools and 
their own need for training. Wherever courses in measurement 
have been offered, the response has been surprisingly large. As a 
result the number of courses available in colleges and universities 
is steadily increasing and their influence is spreading to other fields. 
There are few teachers’ institutes which do not touch upon some 
phase of measurement work, and few normal schools or other teacher 
training agencies which are not beginning to arrange for systematic 
instruction along these lines. At present there is, as might be ex- 
pected, almost no agreement as to the aim or method. The courses 
range from the shallowest survey of the literature of the field, to the 
most highly technical and theoretical courses of university psychol- 
ogy. It may not be out of place, therefore, to give some account of 
the needs of teachers as seen by one who has served for several 
years as director of research in a large city school system. 

The primary aim of all courses for teachers must be to increase 
the teaching power of the students. But teachers are the stuff 
from which principals, supervisors, and superintendents are made, 
so the training courses must be both broad enough, and wide and 
deep enough to give some knowledge of the administrative and 
supervisory uses of tests as well as of their instructional and diag- 
nostic functions. For most teachers, practically the only opportun- 
ity to consider the workings of the school system as a whole and to 
think about educational problems from the supervisory and admin- 
istrative standpoints is that which comes to them during the period 
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of training. Therefore, while the measurement work in a normal 
school should center around the direct application of measurement 
to the solution of problems of teaching, the broader aspects of the 
results secured must not be overlooked. 

The desirable outcomes of courses in measurement, as of other 
instructional work, are of four different types — (1) point of view, 
(2) knowledge, (3) skills, (4) power. Each of these will be dis- 
cussed in turn. 

1. Of all the possible outcomes of training work in measure- 
ment, none is so important as the effect upon the student’s point 
of view — upon his attitude, not only towards scientific experimenta- 
tion in education, but also towards his educational experiences and 
life. For, too often, normal-school courses result merely in giving 
the prospective teacher the idea that everything in education is 
settled. A textbook is necessarily dogmatic in statement. Courses 
in methods and teachers in charge of training necessarily tend to 
stamp this as right and that as wrong. The training period is al- 
together too brief to do more than outline existing ideals and prac- 
tices, and indicate those that are ‘‘the best.” Even practice teach- 
ing under expert guidance usually serves but to give practical 
experience in ^‘handling a room” and in conforming to teaching 
routine. Occasionally, a student of exceptional intelligence senses 
the contrast between ideals sind existing conditions, but even his 
experiences seldom stir more than questionings and dull resent- 
ment. 

Teaching, itself, is even more deadening to initiative. In all 
systems of any size the courses of study, and often the method to 
be used, are fixed by higher powers. The equipment, the time allow- 
ances, the size of classes, the types and abilities of the children, are 
all beyond the teacher’s control. What is there in the training or 
experience of the average teacher to develop openness of mind, 
or give any conception of our present educational process as a crude, 
inefficient, wasteful makeshift, established ‘‘by guess and by gosh,” 
and maintained by convention and social inertia? 

Therefore, it should be the supreme function of measurement 
courses in normal training — as it has proved to be in the educa- 
tional activities in the world outside the school — to give the student 
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the scientific attitude of mind, the critical, impersonal, and inquir- 
ing point of view. They must teach him how knowledge arises and 
make him feel the cost in time and labor by which the present levels 
of civilization have been attained. They should so clearly reveal 
the part that measurements and scientific methods have played in 
every field of human activity that he will realize their importance 
in education and desire to make himself proficient in their use. 
They must give him bases of criticism and arouse in him such a 
passion for truth that all his life long he will constantly seek to test, 
open-mindedly, disinterestedly, impersonally, the validity of all 
conclusions. They should lead him to regard all educational ac- 
tivities as problems in course of experimental solution, so that he 
will be ever on the watch for those significant variations which make 
for progress. Above all, they must so open his eyes to the wonders 
of the educational process, the possibilities of child development, 
and the relation of progress in education to progress of the world 
that he may have an abiding faith in the dignity and value of his 
profession and a burning zeal to make some contribution to the 
progress of the race. If the courses do this, they will be counted 
successful long after all technical knowledge and skill has been for- 
gotten ; if they do not, they are failures, although their graduates 
know every test by name and are past masters in the art of compil- 
ing tables and graphs. 

2. On the side of knowledge there is much to be learned. The 
student must acquire by actual experience, knowledge of the differ- 
ent types of tests and the advantages and limitations of each. He 
must be familiar with the methods of test and scale construction 
and must have a first-hand experience in giving and scoring the 
more important of the available standard tests. He needs to know 
where to go for standards and comparative data, and he should 
have made a careful, critical study of two or three typical survey 
reports. He must be given, also, some experience with the varia- 
tions in performance caused by changes in conditions and must 
learn how these are to be controlled and interpreted. More than 
anything else, his practical work must serve to emphasize the differ- 
ence in individual children and the need of adjustment of training 
to such differences. 
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It is not enough, however, that the teacher in training have a 
practical working knowledge of educational tests; he must know 
also tests in related fields. He should be given more than a passing 
acquaintance with physical measurements — ^height, weight, lung 
capacity, and their significance, and simple tests of vision and hear- 
ing. He needs also some knowledge of the methods and tests em- 
ployed in measuring intelligence and capacity. Further, he needs 
practice in the construction of rough tests and examinations, and 
the formulation of aims in terms of objective standards. Knowledge 
that is only knowledge, is vain. If the knowledge outlined above 
is merely memory of things read or transmitted by word of mouth, 
it will be of little worth. It should be knowledge derived from per- 
sonal experiences. 

3. The other product of experience is skill, and the successful 
course in educational measurement will have as one of its outcomes, 
the ability to pass certain standard tests in statistical methods ; for 
instance, rate tests in making typical distributions, in finding aver- 
ages and medians, in computing median and standard deviations, 
in calculating coefficients of correlations, and in drawing graphs. 
Even more important than these are standard tests of ability to 
use educational scales in a consistent manner. There should be 
training on some of the standardized samples which have been pub- 
lished in writing and composition, until a set of 20 test samples can 
be marked without a variation of more than half a step of the scale. 

Only as the course results in a measurable proficiency in these 
fundamental skills should it be counted successful. Not all persons 
have the mental qualifications that make possible accurate judg- 
ment by means of a scale, and not all have the aptitude for statis- 
tical work; but all teachers, without exception,7i€ed to have meas- 
ured themselves against such objective standards, both that they 
may know their own powers or limitations, and that they may un- 
derstand the method and aims of those who have the measurement 
work in charge. The greatest obstacle to harmonious, cooperative 
work in a city school system is the misunderstanding of those who, 
through ignorance, misinterpret the aims of the measurement work. 

4. Finally, the successful courses in measurement should re- 
sult in power to use measurement in the solution of educational 
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problems. That is, for full credit, the student should be able to 
pass successfully three types of power tests; (1) he should have 
planned, measured, and compared the effects of his own and an- 
other’s teaching of a specific instruction unit, like the teaching of a 
given twenty words in spelling, or a certain case in long division ; 
(2) he should have devised, executed and interpreted a simple con- 
trol experiment to settle some problem arising out of his practice 
teaching; (3) he should be able to diagnose by means of appropri- 
ate tests, and to prescribe the remedy for, the more common causes 
of failure in the fundamental subjects. No teacher who has had the 
practical experience with tests and testing that will enable him to 
meet these requirements will ever be willing to teach without the 
aid which standard tests afford. 

There are five major articles in the writer’s educational creed: 

1. Basic experiences cannot be transmitted by instruction. 

2. Understanding of the value of tools is best learned by their 
use as a means to an end, 

3. Skill in the use of tools is best developed through drill. 

4. Outcomes related to self-interest have greater educational 
potency than abstract aims. 

5. All training work must be adjusted to the varying capaci- 
ties and interests of individual students. 

Expressed in terms of method of teaching a course in meas- 
urement these mean : 

1. That the work must be ‘‘practical” in character. 

2. That it must center around, and have for its purpose, the 
measurement, and improvement, of the practice teaching. 

3. That it must consist mainly of laboratory work with only 
as much lecture work and reading as is necessary to connect the ac- 
tivities of the course with similar activities in the school system 
and in the educational world outside. 

4. That it should consist of a series of graded exercises or 
projects grouped around the main topics in such a way as to pro- 
vide for individual progress and differences in interest. 

It seems futile to attempt to give more specifically a statement 
of precise topics covered, or the length of the course in terms of 
years or hours. Adjustment must everywhere be made to local 
conditions. But the material available and its importance warrants 
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the prophecy that in a very short time the work in measurement 
will be a major subject running through the entire period of train- 
ing. For a six semester course (four semesters in school and two 
actual teaching under supervision), the following might serve as 
a general program. For shorter courses, the actual amount of work 
done would be correspondingly decreased. 

Semester I. General principles of measurement, including 
physical and mental tests, and elements of statistical methods, with 
special emphasis on individual differences, the factors causing vari- 
ation, and the need for control of conditions. 

Semester II. Use of the simpler standard tests in measuring 
the effects of practice teaching. 

Semester III. Measurement by means of the more complex 
tests, with special emphasis upon correlation of abilities and analy- 
sis of complex ability into simple elements. 

Semester IV. Measurement of the results of educational ex- 
periments and of the effect of remedial work with individual chil- 
dren. Emphasis on the diagnostic and supervisory uses of tests, 
both for individuals and for a school system as a whole. Class as- 
signments mainly individual projects, or actual participation in 
surveys and other practical testing work. 

Semesters V and VI. Period of probationary teaching. Re- 
ports to supervisory officers to be based on the use of standard tests 
of the results of teaching effort. 

An appropriate conclusion to this chapter is a comment on 
that anomaly — a course on educational measurement in which no 
use of measurement is made, either as a basis of adjusting the work 
to the abilities of individual members of the class, or as a means of 
measuring the efficiency of instruction. As long as educational 
training concerns itself with superficial conformity to conventional 
practices and hasty surveys of educational literature, little in the 
way of progress can be expected. But if the work in the training 
and practice schools is actually the most efficient in the system, and 
if there, the teacher-in-training learns to see problems as problems, 
and to attack them with the best tools and methods, the training 
which he receives will function all through his professional life. 
Probably upon the character and practical value of the instructions 
given to prospective teachers, more than upon any other one factor, 
depends the success of the movement for measurement and the char- 
acter of its future development. 



CHAPTER XI 

SUGGESTIONS POE EXPERIMENTAL WORK 

SEOEGE MELCHEE,. 

Director of Bureau of Besearch and Efficiency, 
Public Schools, Kansas City, MissourL 


Empiricism and Science in Education 

The United States has no national system of education. Each 
state, each municipality, each school district plans its own course of 
study, determines its own school organization, supervises its own 
school work, chooses its own plan of procedure, establishes its own 
standards. Our nation is called ‘Hhe world’s greatest experiment 
in democracy.” With equal truth it could be called ‘‘the world’s 
greatest experiment station in education.” All kinds of educational 
problems are being attacked experimentally — problems in the ad- 
ministration, the supervision, the teaching and the financing of 
schools. All varieties of curricula are involved, all kinds of school 
equipment and appliances, and every conceivable method of teach- 
ing the various subjects. 

Many communities believe that their methods of school pro- 
cedure are the best. Yet they are wholly without scientific evidence 
in support of their belief. Thus it is often easy for an enthusiastic 
devotee of some new method of procedure to influence the agencies 
of control to such an extent that an entire school organization is 
revolutionized. Witness the change of a large number of elemen- 
tary schools in New York City to the Gary plan of organization. 
We are not arguing that New York City has not improved her 
schools by the change ; for we do not know whether she has or not. 
New York City does not know. No one knows. The Gary System 
may have many valuable contributions to make to education. The 
point is that the plan is new and has not been in operation long 
enough to have demonstrated its superiority over the conventional 
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school organization. Yet, in spite of that fact, the Gary system has 
been and is being enthusiastically copied not only in New York City 
but in many other communities as well. In like manner, countless 
other changes in school organization and school procedure are be- 
ing made year after year on the basis of plausible arguments ; and 
no one is able to demcmstrate the wisdom of the change. In many 
cases a few years later the school returns to its original plan. Wit- 
ness the more recent action at New York in abandoning the Gary 
plan after spending several million dollars in introducing it. Could 
industrial establishments succeed by such methods ? 

School authorities are constantly changing methods of pro- 
cedure, and courses of study. This harasses teachers, annoys pa- 
rents, lowers the efficiency of work, and destroys the confidence of 
the public in its school system. Such arbitrary orders are based 
merely on the personal opinions of those in authority. Even up 
to the present time, almost every school policy has been adopted on 
insufficient evidence or upon no evidence at all. In other words, 
personal opinion decides the destinies of school children and de- 
termines school curricula and school organization. Scientific 
methods have been recently introduced into industrial work, busi- 
ness, agriculture, and many other forms of human activity. Even 
the church is having its methods subjected to rigorous criticism 
from the standpoint of efficiency. In the management of the war, 
we see great nations struggling to apply every known scientific 
principle. Certainly education, the most important of human in- 
dustries, cannot afford to neglect any opportunity to test scien- 
tifically its methods of procedure and to demonstrate the value of 
its results. 

In this chapter I am to make a few practical suggestions that 
may be helpful to the teacher, principal, supervisor, or superin- 
tendent who desires to contribute his ‘‘bit ’Ho educational progress. 
These suggestions will be valueless to the persons who still believe 
that we can measure achievements in school work by the personal 
opinion — frequently the offhand and prejudiced opinion — of some 
one individual. We know that improvement in all human activities 
depends very largely upon critical studies of existing conditions and 
upon the establishment thereby of standards to be attained at each 
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stage in the development of these activities. It is evident that 
progress in education is dependent upon a similar process. The 
inability of school administrators in the past to make reliable meas- 
urements has greatly retarded progress. It is true that th^ have 
been able to measure such facts as per capita costs, numbers and 
ages of graduates, the percentages of retardation, etc. Helpful as 
such measurements are, they fail to yield the information upon 
which an adequate appraisal may be based. The recent development 
of means of measurement, however, permits true educational ex- 
perimentation. 

Method op Comparison 

In order to determine progress in any phase of school work, 
one must first analyze the situation to discover, as far as possible, 
the various factors that enter into it. These factors must then be 
studied singly and the effect of each determined. It is the analysis 
of these complex situations and the study of each of their factors 
that renders educational measurement so difficult. This is espe- 
cially true in attempting to compare the efficiency of different meth- 
ods of school supervision, organization, or administration. 

It is probable that experimental work can render its greatest 
service by being applied to a study of methods of teaching and of 
the organization of subject matter. These problems are constantly 
before superintendents of schools and one or more of them can be 
selected each year for special study. The mere fact that they are 
being studied will have a beneficial effect upon the school system. 

As we do not have standards in all school subjects, it is nec- 
essary to employ many other means of measuring school achieve- 
ments. One device, concerning which a few suggestions will now 
be offered, is the ‘Control Experiment.’ In order that the con- 
clusions from a control experiment may be of value, a careful plan 
of procedure must be followed. 

First. Analyze your situation and select one factor to be 
studied. 

Second. Select two groups of pupils approximately equal in 
number, in ability, and in previous training. Each group may con- 
sist of one, two, three, or more classes. 
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Third. At the beginning of the experiment, carefully meas- 
ure the ability of each pupil in the factor under consideration. 

Fourth. Select teachers who are open-minded. If possible, 
select those who know something of scientific methods. Especially 
select those who are willing to cooperate and who appreciate the 
value of following directions. The teachers should be as nearly 
equal as possible in teaching ability. 

Fifth. Prepare carefully detailed instructions for all teach- 
ers who are participating in the experiment. 

Sixth. Except for the one factor that is being studied, keep all 
the conditions in the two groups as nearly equal as possible during 
the progress of the experiment. 

Seventh. Continue the experiment long enough for material 
changes to be made — several weeks, half a year, or even more may 
be necessary. 

Eighth. At the conclusion of the experiment, carefully meas- 
ure the ability of the pupils in the factor under consideration. 

Ninth. Base conclusions as to relative efficiencies upon a study 
of gains and losses of only those pupils who were present through- 
out the period of the experiment. 

Tenth. Allow for the effects of any varying factor other than 
the one under consideration. 

Eleventh. Avoid conclusions from insufficient data. 

Twelfth. Record and preserve the details of the procedure. 
It may prove to be desirable to check the conclusions by repeating 
the experiment. 


Explanation of Steps in Plan 

First. School results are complexes. Many factors usually 
contribute to a single result. In a control experiment only one 
factor should be varied and studied in order to determine the effect 
of that factor; as, for example, the proper distribution of time in 
teaching penmanship. Assume that sixty minutes a week may be 
devoted to the teaching of penmanship in the fifth grade. The fol- 
lowing question arises: ‘‘Shall this time be divided into five 12- 
minute periods a week, four 15-minute periods, three 20-minute 
periods, or two 30-minute periods?” An experiment designed to 
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answer such a question should provide for four groups of fifth-grade 
pupils, each working with one of the suggested distributions of the 
60 minutes per week to be devoted to penmanship. It may be that 
a different distribution of time would be advisable in other grades. 
Hence, similar questions remain to be answered for grades below 
and above the fifth. 

A second question and one entirely different from the one above 
is, ‘‘ What is the number of minutes a week in each grade that 
secures best results in the teaching of penmanship?’’ Note that 
this is an entirely different question from the first and cannot be 
studied at the same time without confusion of results. 

Still a third problem : ‘ ‘ To what extent can progress in writ- 
ing be secured without regular drill periods in the various grades?” 
This problem involves entirely different elements from the first and 
second problems and cannot be studied in connection with them. 
One factor only in a situation can be successfully studied at a time. 

A pupil’s ability in any subject is not a single general ability 
but the resultant of several special abilities. For example, in arith- 
metic there are an almost unlimited number of abilities, as ability 
in adding long columns, short columns, small numbers, large num- 
bers, etc., different abilities in subtracting, multiplying and divid- 
ing whole numbers ; different abilities in operations in fractions, as 
in adding, subtracting, multiplying, dividing, reducing, etc. A 
given control experiment can deal with only a single ability or a 
single factor in a situation. The science of education will be per- 
fected by solving correctly the numerous small problems involved 
in the educative process, as successful manufacturing is perfected 
by handling correctly each detail of its work. i 

Second. In the type experiments to which I have referred 
in the preceding paragraph, the number of pupils is not large. 
Larger numbers of pupils add to the reliability of the results, al- 
though they make more diificult the control of all the factors in- 
volved and the maintenance of constant or standard conditions. 

*The reader is referred to the January, 1912, number of the Teachers Coh 
lege Becord for a detailed study of the ‘ * ^parate ^ ' and * ' Together ^ ’ methods 
of teaching homonyms. Here there is a splendid illustration of a well-organized 
and a well-conducted control experiment. The Teachers College Becord for 
September, 1913, has another good example of a control experiment on the ques- 
tion of ** Formal English Grammar as a Discipline.^’ 
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Moreover, when the number of pupils is large, the handling of the 
statistical work becomes very laborious. Accordingly, more trust- 
worthy results may often be secured by the repetition of the experi- 
ment in other school systems under standardized conditions. 

The groups should be equal in ability because groups having 
low initial scores usually make large numerical gains with less ex- 
penditure of effort than do groups having high initial scores. When 
equal gains are expressed as percentages they seem much greater 
when based on low initial scores than when based on high initial 
scores. Percentage comparisons based upon unequal initial scores 
are thus very deceptive. Suppose that one fifth-grade group has an 
average initial speed of 50 letters a minute and the other of 75 let- 
ters a minute. An average gain of 10 letters a minute in each group 
appears to be an equal gain ; but when expressed in percentage form 
it is 20 percent for the slow group and 13^^ percent for the rapid 
group. Thus the slow group has the higher percentage of gain. 
Yet anyone at all familiar with the teaching and learning processes 
in school knows that a gain of 10 letters in the rapid group is a much 
greater achievement than is a gain of 10 letters in the slow group. 
Hence the importance of having the groups of equal initial ability 
in order to estimate rightly the improvement. It is well that the 
sex distribution in the groups be equal and that the previous train- 
ing and home environment be as nearly equal as possible. Espe- 
cially is this equality important when small groups are used. If 
several classes, selected at random, are used in each group, varia- 
tions in conditions in the different classes will tend to neutralize 
each other. 

Third. In measuring the skill or ability of pupils, care must 
be taken that the tests measure the ability that is being studied, and 
that they are given under uniform conditions. No detail is too triv- 
ial to be considered. For example, a principal who was giving a 
test in arithmetic found a room in disorder because it was tempor- 
arily in charge of a substitute who was poor in discipline. Before 
giving the test he administered a sharp reproof to the pupils. Al- 
ready in a bad attitude, they were humiliated by the reproof and 
failed to respond properly to the test. An equivalent one, given a 
few days later in the same room, secured far better results. Not only 
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must directions be followed exactly, and uniformity of procedure be 
maintained in all rooms, but also the proper attitude of pupils must 
be secured. Note in Dr Briggs' study on ‘‘Formal Grammar" that 
he administered the test to both groups at the same time, thus in- 
suring uniformity. 

In almost every measurement of school achievement, both quan- 
tity and quality must be considered. Thus in handwriting we meas- 
ure both the rate and the quality of the writing. The customary 
directions for a handwriting test are: “Write as well as you can 
at your usual rate of speed the following sentence. Write the sen- 
tence again and again until I say ‘Stop'." Suppose the teacher 
adds as a final suggestion, “Now, do your best, children"; the 
rate will then generally be reduced materially and the quality im- 
proved only slightly. In one room this added suggestion reduced 
the usual speed nearly fifty percent. From these comments it is 
evident that the results would be more trustworthy if the same per- 
son gave both the initial and final tests. In measuring school 
achievements, group measurements are usually taken, since forty 
pupils can be tested in a group in about one-fortieth of the time 
required to test each pupil individually. In some cases, as in oral 
reading, individual tests are necessary. Usually the amount of 
time required to give individual tests limits such tests to a very 
small number of pupils. 

In testing groups the time-limit method is generally used; that 
is, rate or speed is measured by the amount of work done in a given 
time. In such tests, absolute uniformity of time is essential. For 
keeping time, a stop-watch is desirable, or at least a watch with a 
second hand. 

When individuals are tested, the work-limit method may be 
used. According to this method each pupil is given the same amount 
of work and his performance is measured by the time required to 
do it. Individual testing by the work-limit method is doubtless 
preferable to group testing by the time-limit method, but it is sel- 
dom practicable in ordinary classroom work. 

Fourth. The preceding paragraphs indicate the wisdom of the 
directions with regard to teachers. Unless teachers are open-minded, 
they will vitiate results. A teacher who feels that she must prove 
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that the method or book that she is using is superior to the other 
method or book will unconsciously destroy uniformity of conditions. 
Unless teachers have a scientific attitude, they will fail to appre- 
ciate the value of many of the requirements in the experiment 
and are likely to feel that a failure to get highly satisfactory re- 
sults is a reflection upon them rather than upon the method. To 
maintain their own reputations as teachers, they must make the 
method successful. In order to do this they prevent the standard- 
izing of conditions. Unless teachers follow directions conclusions 
will be worthless. 

Fifth. In addition to the detailed instructions in the hands 
of the teachers, conferences with them are also desirable. 

Sixth. As far as possible, all factors except the one under 
consideration should be kept uniform or standardized for all 
groups. For example, in the suggested study on ‘‘Time Distribu- 
tion in Teaching Handwriting,’' no home study or practice on 
handwriting outside of school should be allowed, for such practice 
cannot be made uniform. No attention should be given to hand- 
writing in other subjects, otherwise varying factors will be intro- 
duced; all practice in and teaching of handwriting should be done 
during the sixty minutes a week. All other work should be as 
nearly identical as possible. In the period between the initial and 
final testing, the quality of the teaching of the different groups 
should be, as nearly as possible, the same. It is difficult to select 
teachers of equal ability ; but by alternating them, or by repeating 
the entire experiment with similar alternation, the desired result 
may be secured. Sometimes the same teacher may instruct the sev- 
eral groups at different times. 

Seventh. The amount of time required for a control experi- 
ment varies with the nature of the experiment. A few weeks may 
be sufficient to show differences in improvement in spelling or 
handwriting. Several months, a year, or even more, may be needed 
to permit a fair estimate of the value of two methods in reading, 
or the value of kindergarten work. 

Eighth. The same care in measuring the ability of pupils at 
the end of the experiment must be used as was used at the begin- 
ning. The final test must be carefully chosen and must be of the 
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same difficulty as the initial test. The opportunity for each group 
to prepare for this final test should be absolutely equivalent except 
as to the one factor under survey. 

In a general study of the school achievements in a system, it 
is easy to be deceived by two measures taken at different times. 
Suppose, for example, that you test pupils in spelling in October, 
using words from the Ayres’ Scale. Suppose also that the teachers 
of your school have not had access to the Ayres’ Scale prior to 
this time, but that after the test each teacher has a copy of it and 
drills on the words. Even though at the close of the experimental 
period you do not use the same words as in the first test, but select 
from the Ayres’ Scale other words of equal difficulty, it is probable 
that 3"our children will have apparently made great gain. You will 
not, however, have measured the real gain in spelling ability, since 
the pupils have been specially prepared for the second test. 

Ninth. The scoring of results must be absolutely uniform 
and the tabulations made in the same way. The greatest uniformity 
is secured by having the same person do all the scoring and tabu- 
lating. If more than one person participates in this work, specific 
directions must be given to insure uniformity of work. The anal- 
ysis and the correct interpretation of the gains and losses in a con- 
trol experiment are the most important parts of the work. The 
value of a method or of a given material is measured by the gain 
which results from its use. The method of computing this gain 
will depend somewhat upon the character of the factor that is being 
studied. In general, in measuring school achievements, the median, 
as a group measure, has the advantage over other measures. It is 
easily and quickly computed and is not unduly affected by extreme 
scores. These extreme scores are always under suspicion. Further- 
more, especially high or especially low individual scores have little 
scientific value in determining, let us say, the efficiency of a method 
of teaching, since what is done by the unusual pupil does not meas- 
ure the value of the method for the great mass of pupils. Very high 
scores or very low scores affect the average much more than they 
do the median. The median, moreover, is likely to represent more 
closely the central tendency. It is often desirable to compute from 
the medians the percentage of gain or loss. When, however, the 
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percentage of gain or loss for each pupU is reported, the figures are 
misleading, unless the bases (i. e., the initial scores) on which the 
percentages are computed are equal. Besides medians and per- 
centages, the distribution of the scores (the frequency of each 
score) must receive careful consideration. The amount and per- 
centage of gain or loss for all pupils having the same initial score 
are valid figures. The measure of the variability of the gains is 
likewise important.^ 

Tenth. After every reasonable effort has been made to keep 
the factors uniform during the experiment, varying factors will 
often enter. These must always have consideration and their prob- 
able effects must be estimated. 

Eleventh. A single experiment, even when carefully con- 
ducted, is often not conclusive. In the report on the ' separate ’ and 
‘together’ methods of teaching homonyms, to which reference was 
made above, note how guarded Dr. Pearson is in his statements. 
So far as his experiment is concerned, certain things are true. 
Another experiment might show different results. 

Twelfth. When any conclusion of educational value has been 
reached by a single experiment, it should always be possible to ver- 
ify the result by repeating the experiment at another time or in 
another school under the same standardized conditions. This can 
be done only when a careful record has been kept of the plan of 
procedure and of the important controlling factors in the experi- 
ment. 

Uses of Control Experiments 

Some of the uses of control experiments have been suggested 
in the preceding discussion. While these uses are various, the fol- 
lowing may be particularly mentioned : 

a. To determine the relative value of two methods of teach- 
ing. 

b. To determine the relative value of two books or two kinds 
of drill material. 

c. To determine the best distribution of a given teaching time. 

*For statistical methods see Thorndike ^s Introduction to Mental and Social 
Measurements^ Science Press, New York; Whipple’s Manual of Mental and 
Physical Tests, Warwick and York, Baltimore, and Rugg’s Statistical Methods 
Applied to Education, Houghton and MiflSin and Company. 
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d. To determine the amount of time needed to secure optimal 
results in a given subject. 

e. To determine the relative value of different organizations 
of subject matter. 

f. To determine the size of classes that will secure optimal 
results. 

g. To compare two types of school organization. 

Possibly these last two are too complicated to handle as an 
ordinary control experiment, but the principles of a control ex- 
periment should be applied. 

A control experiment is simple, but requires a scientific atti- 
tude and the constant use of good judgment and common sense. 
Many experiments of great promise have been rendered valueless 
by the neglect of one or two minor details. It will be a great boon to 
education when the various experiments that are being made in its 
field are placed under sufficiently controlled conditions to render 
the conclusions of scientific value. 

Large Problems 

In addition to the simpler problems suggested above, there are 
many others which are too comprehensive for any school district or 
even for any state to solve alone, and which must, therefore, be 
solved, if at all, by the cooperation of many agencies. Investiga- 
tions concerning types of school organization, length of sessions, 
sizes of classes, evening schools, recreation in and outside of school, 
vocational education and guidance, training of teachers, etc., are 
of too general a character to be satisfactorily made by a single 
agency. Various organizations are working on certain of these 
problems. Among them are the following: 

A. The National Society for the Study of Education. 

B. The National Education Association. 

C. The National Association of Directors of Educational Re- 
search. 

D. The American Association for the Advancement of Science. 

E. Various national organizations for teachers and workers 
in a number of fields. 
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F. Departments of education in universities and normal 
schools. (Some of these institutions have organized cooperative 
bureaus as described in another chapter of this Yearbook.) 

G. State departments, some of which have cooperative bu- 
reaus similar to those in universities and normal schools. 

H. Educational Foundations. 

I. United States Bureau of Education. 

The United States Bureau of Education should have a Division 
of Educational Standards and Measurements. This bureau should 
be in charge of a thoroughly trained scientific student of education, 
seasoned by practical and successful school experience, and should 
become a national clearing house for educational experiment. Sev- 
eral lines of work are open to such a bureau. 

First. Testing of Methods. Two methods of teaching could be 
tried in a score or more of rooms in each of one hundred cities. 
The United States Division of Standards and Measurements could 
prepare outlines of experiments, provide both the initial and final 
tests, compile and analyze data from these one hundred cities and 
deduce conclusions that would have general value in educational 
work. The United States Bureau of Education would thus develop 
an influence in the field of education similar to that exerted by 
the Department of Agriculture in its sphere. Bulletins of the 
United States Bureau of Education would then be read and studied 
by progressive teachers and school administrators as the Agricul- 
tural Bulletins are now read by leaders in agriculture. Such work 
would enable every community to profit by the experience of other 
communities by adopting the really successful plans and avoiding 
or discarding those found to be unsuccessful. Under present con- 
ditions, however, although a given method may prove unsuccessful 
in a score of communities this year, another group of communities 
may try the same method next year, not knowing that it has al- 
ready failed or been discarded. Thus an unsuccessful method, if 
well-advertised, may gain admittance into many school systems only 
to be cast into the educational ^‘junk pile'^ in time. 

Second. Testing drill material, books, and school appliances. 

Third. Conducting at various points in the United States edu- 
cational experiment stations, and making preliminary tests of cer- 
tain methods before extending their use to other localities. If the 



SUGGESTIONS FOE EXFEEIMENTAL WOEK 


151 


vertical STstem of handwriting had been tested out for a few years 
in a large commercial center, its shortcomings would have been 
discovered, and it probably would not have spread over the country. 

Fourth. Investigating many of the large problems of school 
administration that can be settled only by nation-wide study. These 
studies should be impartial and should not be conducted by en- 
thusiastic advocates of the scheme to be studied. This caution is 
submitted because even the United States Bureau is thought to have 
sometimes fallen under the influence of emotionalists and educa- 
tional promoters. The value of the Gary system, as referred to 
above, is a problem of national importance. The same Bureau 
of Education should be able to make a survey of the Gary Schools 
in operation in various places, and to secure accurate data as to the 
results that are being obtained. In the course of ten or twelve 
years, the country could know the facts with regard to this organiza- 
tion. Kansas City and several southern cities have a seven-year 
course of study in the elementary schools. If it is possible to cover 
the course satisfactorily in seven instead of eight years, the entire 
educational world should be made aware of this fact by an im- 
partial study of the work of these seven-year systems furnished by 
the United States Bureau of Education. The movement to estab- 
lish junior high schools is well under way. The value of this type 
of organization should be established, and no agency could do so 
better than the United States Bureau of Education. 

While, therefore, there are many problems, both large and small 
which await solution, the value of the solution of many of them, it 
is urged, would be greatly enhanced if it were made from an au- 
thoritative, impersonal, and national point of view. 



CHAPTER Xn 

A LOOK FORWARD 


CHARLES H. JUDD 

Director, School of Education, University of Chicago. 


A paper dealing with the future can justify itself in a scientific 
volume of this kind only when it bases itself on an analysis of 
present conditions and aims to develop as a result of such an 
analysis suggestions for improvements and enlargements of the 
movement under discussion. The reader is warned, therefore, at 
the beginning that this paper looks backward as well as forward, 
in order that justification may be furnished for some of the plans 
urged as desirable for the future. 

One fact which is evident to every student of school problems 
is that the movement toward the development of measurements is 
both promoted and seriously encumbered by a vague popular de- 
mand. Parents have heard that there are methods of finding out 
whether their children can spell or add or read satisfactorily, and 
immediately a clamor arises for a measurement of the local school. 
The demand is likely to be especially keen if there is some parent 
in the community who does not like the superintendent or the prin- 
cipal. Such a parent never for a moment believes that responsi- 
bility for unsatisfactory school results is to be traced to the native 
limitations in the ability of his child or to the home atmosphere 
in which the child grows up. Such a parent is quite certain that 
measurement will detect at some point a lack of perfection, and 
then he knows that his dislike for the school officer will have the 
sanction of science. 

It is little wonder that school superintendents have often been 
afraid to have their schools measured. Especially hazardous is it 
to have the schools measured in many respects in a single survey. 
The number of imperfections sure to be revealed in a general survey 
is appalling. One has the same dread of going to a dentist to have 
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his teeth examined, knowing well that one’s teeth are sure to ‘go 
to pieces’ under the keen scrutiny of the expert. The case would 
be still worse for most of us if we were obliged to submit the im- 
perfections of our features to the relentless analysis of an expert 
in physiognomy. 

In the presence of a popular demand for the revelation of im- 
perfections and the absolute certainty that imperfections exist, it 
is not difficult to understand why there should be a tendency on 
the part of many school officers to combat the movement toward 
wide-spread measurement. 

The future is sure to develop a new and more wholesome atti- 
tude on the part both of the public and of school officers. Indeed, 
the measurements which have been made up to this time have more 
than justified their cost in effort and money, because they have dis- 
pelled forever the idea that schools should produce a uniform 
product or one that is perfect in its attainments. We all under- 
stand now in definite scientific terms that children are different 
from one another, that the lower grades progress slowly toward 
satisfactory results, that movement within the school is purchased 
at great expenditure of labor on the part of all concerned, and that 
the best we can hope for is improvement — ^not absolute achievement 
of ideals. 

With the theoretical ideal of perfection overthrown, there is 
now an opportunity to set up rational demands. We can venture 
to tell parents with assurance that their children in the fifth grade 
arc as good as the average if they misspell fifty percent of a certain 
list of words. We know this just as well as we know that a certain 
automobile engine cannot draw a ton of weight up a certain hill. 
No one has a right to make an unscientific demand of the automobile 
or of the school. 

As soon as school officers recognize the fact that measurements 
define for them just how much may reasonably be demanded, they 
will be unafraid of measurements. Indeed, they will learn the ad- 
ministrative lesson that it is better to know for purposes of ordinary’' 
routine what ought to be demanded than merely to guess at condi- 
tions. The writer once heard a business man put the matter very 
clearly. He was looking at some diagrams that showed the results 
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of a study of schools. ‘‘This/^ he said, ‘‘is the sort of thing busi- 
ness has learned to do. We used to be offended if anyone criticised 
our methods or commented on our results. Now we know that our 
best friend is the man who comes and tells us exactly where we 
stand. The one thing business cannot approve is ignorance about 
results. We do not fool ourselves any more, come what will of the 
revelation.’^ 

The school principal who knows in advance where his school 
is weak and where it is strong, is armed against criticism. But 
more than that, he is guided in his future efforts. The purely nega- 
tive result that adverse judgment causes no shock, is of some im- 
portance, but the i)ositive result that the school is stimulated to 
improve itself is a matter of supreme advantage. If we can devise 
methods of knowing ourselves, we shall take up the tasks of self- 
improvement with assurance and with discrimination. 

The first prophecy, then, which one can venture with a good 
deal of assurance is that school ofiScers will learn to anticipate popu- 
lar demands and will thereby come into possession of information 
which will guide them in their own work. 

A second general fact about measurement is that up to this 
time it has dealt with very broad problems and usually has grouped 
together great masses of results. This is seen in the fact that one 
speaks in a large way about medians of thousands of cases. The 
sheer breadth of our studies has intimidated teachers. They feel 
that the machinery is set up to deal with systems of schools, but not 
with their detailed problems. 

It is natural enough that the beginnings of this science should 
concern themselves with broad, remote facts. So it has always been. 
The race developed astronomy first because celestial facts are re- 
mote and on a vast scale. It is only in the latter days of refined 
scientific study that we have come to know details about our own 
bodies and the facts of social organization. 

Thanks to the energy which has been expended in scientific 
work, we have the gross methods well in hand. Refinement of 
methods has begun. Formerly we used to compare school system 
with school system. We shall continue this, but we can now begin 
to use our methods for the more specific study of individual cases. 
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Take, for example, the refinements which Ayres has added to his 
last writing scale. Along the bottom are the figures which tell the 
teacher in detail how speed and quality are distributed in the normal 
class. There is in this refinement a larger recognition of the teacher 
and of the classroom contact between teacher and pupil. One can 
not help recognizing that measurement is becoming surer of itself 
and is taking up details. The broad first facts have been collected 
and formulated. Now there is a penetration to the deeper problem 
— one is almost tempted to say to the real problem. 

As soon as teachers learn the possibility of using definite meas- 
urement to solve their individual problems, they will share with 
superintendents the attitude which was described above of wanting 
to uncover the exact facts. Here, for example, is a difficult pupil. 
How far is he behind the class at the opening of the school year ? 
How rapidly does he progress ? Whatever the answer, the teacher 
will be aided in directing the pupil's work if that answer can be 
known with definiteness and detail. 

Everywhere there are indications that measurements are to be 
used by the teacher. The results of supervised study are being 
measured. The results of different methods of teaching are being 
accurately determined. Thus, different systems of reading, differ- 
ent methods of teaching long division, and different methods of 
manipulating the decimal point are being studied. 

Up to this time, teachers, partly because they shared the dread 
of measurements and partly because they thought of measurements 
as remote, have stood aloof from the movement. Now there appear 
the beginnings of a tendency to make measurement a part of the 
class routine. The arithmetic lesson serves at once as a drill exer- 
cise and as an opportunity for measuring results. The rhythmical 
beating of time in writing hdps in the formation of a habit and 
tells the teacher what members of the class, if any, are lacking in 
skill. The measurement of rate in reading helps the teacher to 
decide which members of the class require special attention. 

These beginnings mark the path along which the measurement 
movement must travel in the coming years. There is need of new 
energy in devising methods of class routine which will bring to the 
teacher the exact results which will show how successful has been 
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the work of the class and of the individual. Those who complain 
that the teacher does not have time to make measurements miss the 
point entirely. The teacher often wastes time and effort under exist- 
ing conditions because of ignorance of the direct point where appli- 
cation of teaching energy would be most effective. The right kind of 
classroom measurements, as suggested in the examples cited above, 
do not interrupt class routine at all, but contribute exact methods 
of procedure at the same time that they reveal to the teacher where 
the class stands. 

A conception such as that given in the last paragraphs will also 
clear up another difficulty which teachers sometimes point out. 
They complain that the volume of experimentation is so great that 
the class exercises are disorganized and disrupted. The advice 
which ought to be given to a teacher who makes this complaint is 
that one kind of class exercise should be transformed at a time. 
Methods should be built up in each subject which serve both the 
purposes of measurement and of teaching. This can be done, but 
it requires readjustment and planning. 

The second prophecy which one may venture is, accordingly, 
that measurement will more and more take up details and will be- 
come a common instrument in the hands of the classroom teacher. 

One objection which has been urged again and again against 
measurement is that it deals only with the formal and mechanical 
aspects of education. This objection has nowhere been more defi- 
nitely stated than by Superintendent Horn in his Supplementary 
Survey of Portland Public Schools where he writes as follows : 

‘^It should furthermore be kept in mind that there are many 
things about a school system which can never be definitely measured 
or stated vdth mathematical accuracy. Just where the line is to 
be drawn between the measurable and the non-measurable elements 
that enter into a school is a matter concerning which there is much 
difference of opinion. In other words, the element of opinion enters 
to some extent even into the matter of the possibility of measure- 
ment. 

‘‘For instance, it is an undoubted fact that any man can go 
into a city and count the school houses or the number of the desks. 
Any man can find out the number of teachers employed. Any man 
can count for himself the number of pupils present in a given room. 
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‘‘It takes no particular ability to enable an inquirer to find 
out just how much money is being spent. If the schools spent nine 
hundred thousand dollars in one year and a million dollars the 
next year, any one can deduce the fact that they spent one hundred 
thousand dollars more the second year than the first year. 

‘ ‘ On the other hand, after a comparatively few such facts have 
been definitely ascertained, we come to subjects that cannot be 
measured in mathematical terms, and concerning which there are no 
definite standards. In this realm ideals are not always definitely 
established and opinions are almost certain to vary widely. 

“For instance, if you take two classes, one across the hall from 
the other, who can decide in which of the classes the higher 
ideal of truthfulness or honesty prevails? Who can say which 
teacher is more successful in making the children self-reliant, and to 
what extent? We all know that in such a case, if the two teachers 
are both fairly good, many pupils and patrons will consider one 
the better teacher, while many others will consider the other the 
better. Especially will this be true with reference to such matters 
as the teaching of honesty, industry and self-reliance. Incidentally, 
these very things are recognized as being among the most important 
of all the elements entering into the question of the teacher ^s efiS- 
ciency. A school that turns out manly, honorable, self-reliant boys 
and womanly, efficient girls is likely to be at least a fairly good 
school, no matter what it may do otherwise. A school that fails 
to turn out such pupils can hardly be considered a good one, no 
matter what it may do for its pupils in the way of reading, or 
writing, or arithmetic. And yet these very things, which may de- 
cide between the success or failure of the school, are matters which 
it is almost impossible to estimate accurately, and concerning which 
there may be a wide amount of honest difference of opinion. 

The success of the measurement movement depends on its abil- 
ity to meet this type of objection. 

Some of us might be entirely willing to rest the case after ask- 
ing whether in practical school life anyone ever saw a teacher thor- 
oughly competent in teaching ideals but neglectful of reading and 
arithmetic. The fact is that the conscientious teacher always gives 
attention to both, and the successful teacher is able without omitting 
one to cultivate the other. The theoretical possibility of thinking 
of the two results separately has little significance in dealing with 
real teachers and real schools. Good reading is a school virtue, and 

^P. W. Horn, Beport of Supplementary Survey of Fortland Tul>lic Schools, 
pp. 6-7. April, 1917. 
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when one has measured good reading, he has measured more than 
the trivial or formal side of education. 

The hope of the measurement movement is, however, to do more 
than to deny the validity of such criticism as Mr. Horn makes. 
There is to be progress in covering more fully the details of school 
work. Today we know how to measure many aspects of teaching. 
The reason for our early attack on the formal elements is that these 
yield readily to analysis and thus to theoretical isolation and exact 
treatment. What we need to do is to carry our analysis further 
and then new measurements will become easier. 

A few years ago reading tests seemed impossible. Today we 
have mastered the distinction between oral and silent reading. We 
have good methods of measuring some of the more common types 
of deficiency and we know the rate of progress which is normal in 
the more obvious phases of interpretation. The progress in this 
field within a single year is so large that there is nothing but opti- 
mism in the minds of those carrying on the work. What we need 
is more interest on the part of practical workers and more experi- 
mentation with methods. 

Those of us who have watched the progress of measurement 
will recall distinctly that the earliest critics of the movement were 
more emphatic than the present-day critics in declaring that school 
results could not be measured. This type of criticism was the one 
with which Mr. Rice’s opponents thought they had forever elimi- 
nated him and his type of work from the schools. Steadily the 
range of measurements has broadened. Steadily the productivity 
of the movement has increased. It is not for the advocate of the 
movement to prophesy its limits ; it is perfectly safe for him, how- 
ever, to assure all the world that the end is not yet in sight. So 
long as advantage comes from the pushing forward of this move- 
ment, so long as ingenuity is at hand to devise new modes of pro- 
cedure, the answer to the objection that measurement is limited to 
a few trivial aspects of teaching is steadily becoming more cogent. 

This hopeful conclusion is fully supported by one fact which 
serves at the same time to reveal one of the most important ad- 
vantages of measurement, namely, the fact that with the develop- 
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ment of measurement there is coming into education a greater gen- 
eral clearness and definiteness of purpose. 

An example will show what is meant. In the high school of 
Kansas City, Kans., there is a system of telling the students defi- 
nitely and in detail what they must do if they want to secure the 
higher grades, A or J5, in a given course. The very fact that students 
in that school have all along been given A, B, and C shows that 
measuring of all sorts of intellectual and moral qualities has been 
going on. The interesting fact is that in most places the measuring 
is vague and often unsatisfactory, because no one has taken the 
pains to define what is wanted. Students know that teachers are 
often arbitrary, and, be it confessed, teachers also know that they 
are vague. The system referred to above removes some of the 
ambiguities. It improves the measuring system, making it definite 
and exact, because it analyzes and defines the elements of work 
demanded. 

Suppose that the teachers of a school should concentrate for 
half a year on cultivating the power of concentration of attention. 
Is there any doubt that much new information would be gained 
about concentration and that there would be more accurate methods 
of determining its degree? Measurement will be extended in the 
future. The reasons why one can be so sure about this statement 
are to be found in the history of the past few years. 

The third prophecy that can be made is, therefore, that the 
scope of measurement will be widened until it is sufficiently in- 
clusive to satisfy even the most exacting critic. Concentration of 
attention, ability to attack various kinds of problems, clearness of 
insight, power of inference in various fields will be measured. The 
demand again is for workers who will give themselves the training in 
analysis and take the pains in collecting material that is necessary 
to bring about this consummation. 

It may be unscientific to prophesy about the remoter social 
consequences of such a movement as we are discussing, but certain 
final observations may serve to show why the advocates of measure- 
ment in education are unlimitedly optimistic. The time is rapidly 
passing when the reformer can praise his new devices and offer as 
the reason for his satisfaction, his personal observation of what was 
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accomplished. The superintendent who reports to his board on 
the basis of mere opinion is rapidly becoming a relic of an earlier 
and unscientific age. There are indications that even the principals 
of elementary schools are beginning to study their schools by exact 
methods and are basing their supervision on the results of their 
measurements of what teachers accomplish. A social change of 
this kind is adequate justification for any movement and a sufficient 
guarantee for its continuance. 



CHAPTER Xm 

EDNA BETNEB 

Bussell Sage Foundation, New York City 

A SELECTED BIBLIOGRAPHY OF CERTAIN PHASES OP 
EDUCATIONAL MEASUREMENT 


These magazine articles, bulletins, reports, books and surveys 
are grouped on the basis of similarity of general content and are 
arranged alphabetically within each division according to authors. 

Divisions 

A. Theory of Educational Measurement and Development of 
the Movement. 

B. Tests and Scales in Various School Subjects. 

C. General Reports on the Use of Tests and Scales in Schools 

D. Lists of Tests and Seales 

E. Correlations between Abilities 

F. Teachers’ Measurement 

G. Articles about Surveys and Lists of Surveys 

H. City Surveys 

I. State, County and other surveys. 
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NATIONAL ASSOCIATION 

OF 

DIEECTOBS OF EDUCATIONAL RESEAECH 

Constitution 
Article I 

Name . — ^The name of this organization ahnll be the National Aasociation 
of Directors of Educational Research. 

Article II 

Object . — The object of the Association shall be: (1) the formation of 
independent departments of educational research in all systems of public in- 
struction, and (2) the promotion of the practical use of educational measure- 
ments in all educational research having for its object the improvement of the 
efficiency of the educational administration, supervision or teaching. 

Article ni 

Membership . — Section 1. In general, membership in the Association shall 
be restricted to those who are actively and mainly engaged in research work 
having for its direct purpose the evaluation of the products of educational 
training or the improvement of the efficiency of educational teaching, super- 
vision or administration. 

Sec, 2. Membership in the Association may be either regular or associate. 
Regular members shall have any aud all of the rights and privileges of the 
Association, including the right to vote, to hold office, and to appear upon any 
formal or informal program of the Association. Associate members are not 
eligible for office, have no vote and may not appear in any formal program of 
the Association, or otherwise represent the Association in public meetings, with- 
out special invitation of the Executive Committee; but they are to receive all 
bulletins of the Association, to be notified of all meetings or other activities 
and no distinction is to be made in any informal meeting or program between 
them aud regular members. 

Sec. 3. Any person holding the position of Director of, or supervising, a 
Department of Educational Research in an educational institution, or any im- 
mediate assistant of such director, shall be eligible for full membership. 

Sec. 4. Any person actively engaged in research work in education, but 
holding some educational position other than in a department of research shall 
be eligible for associate membership. 

Sec. 5. The Executive Committee, through the Secretary, shall, if neces- 
sary, ask all applicants for membership to state their positions, duties and 
past achievements in measurement work, and decision as to eligibility shall 
bo made by the Executive Committee. In all cases where the applicant holds 
two or more positions, one of which has to do with educational research, the 
decision of the Executive Committee shall be made in accordance with the in- 
tent of the qualifications for membership as outlined above. Regular members 
who have not contributed to the bulletins of the Association during the year 
shall automatically become associate members at the annual meeting next fol- 
lowing. 
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Article IV 

Dues and Assessments, — There shall be no regular annual dues. The ex- 
penses incidental to carrying on the work of the Association shall be met by 
an assessment voted at each regular annual meeting. 

Article V 

Officers, — The oflScera shall consist of a president, a vice-president, and a 
secretary-treasurer. These officers shall be elected at the regular annual meet- 
ing of the Association. Their duties shall be those usually performed by such 
olScers. 

Article VI 

Executive Committee, — There shall be one executive committee of five mem- 
bers consisting of the officers and the two preceding presidents whose duty 
shall be the conduct of the business of the Association between meetings. 

Article VII 

Meetings, — The time and place of holding the annual meeting shall be 
determined by a vote of the Association. Special meetings of the Association 
or of the executive committee may be called by the president, and must be 
called by him whenever requested by a majority of the executive committee. 

Article VIII 

Amendments, — Changes in this constitution may be made at any annual 
meeting of the Association by the affirmative vote of two-thirds of the mem- 
bers present. 
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ANNOUNCEMENT OF YEAEBOOKS AND EXPLANATION 

OF MEMBERSHIP IN THE NATIONAL SOCIETY 
FOR THE STUDY OF EDUCATION 

The purpose of the National Society is to promote the investi- 
gation and discussion of educational questions. Anyone who is 
interested in receiving its publications may become a member. The 
Yearbooks are issued in several Parts each year and are discussed 
at the annual meeting, which is held in February at the same timo 
and place as the meeting of the Department of Superintendence 
of the National Education Association. There are two types of 
membership, associate and active. Associate members pay $1.00 
annually and receive one copy of each Yearbook, Active members 
pay $2.00 annually, receive two copies of each Yearbook, and are 
eligible to vote and hold office in the Society. 

The Yearbooks deal in a practical way with fundamental cur- 
rent issues in instruction and school administration. The Sevenr- 
teentK Yearbook (calendar year 1918) will comprise Part I, to 
contain the ** Third Report of the Committee of the National Edu- 
cation Association on the Economy of Time,** and Part II, to con- 
tain thirteen chapters on ‘‘The Measurement of Educational Pro- 
ducts,** prepared by the National Association of Directors of Edu- 
cational Research. Both Part I and Part II may be expected in 
Februaiy”, 1918. 

Orders for Yearbooks for 1917 or earlier or for single parts of 
the Yearbook for 1918 are handled directly as commercial sales, by 
the Public School Publishing CJo., Bloomington, Illinois, at the rates 
indicated on the cover of this monograph. To obtain the entire 
Yearbook for 1918 as a member of the Society, pin your check or 
postal order to the following slip, properly filled out, and mail to 
the Secretary now. 
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To Guy M. Whipple 

Secretary of the National Society for the Study of Education 
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active J 

Please enroll me as an v member 

associate ) 

I inclose i ^ payment of active dues for calendar year 1918 

( $1.00 as payment of associate dues for calendar year 1918 
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